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Abstract. Wc introduce vector diffusion maps (VDM), a new mathematical framework for orga- 
nizing and analyzing massive high dimensional data sets, images and shapes. VDM is a mathematical 
and algorithmic generalization of diffusion maps and other non-linear dimensionality reduction meth- 
ods, such as LLE, ISOMAP and Laplacian eigenmaps. While existing methods are either directly or 
indirectly related to the heat kernel for functions over the data, VDM is based on the heat kernel 
for vector fields. VDM provides tools for organizing complex data sets, embedding them in a low di- 
mensional space, and interpolating and regressing vector fields over the data. In particular, it equips 
the data with a metric, which we refer to as the vector diffusion distance. In the manifold learning 
setup, where the data set is distributed on (or near) a low dimensional manifold M'^ embedded in 
MP, we prove the relation between VDM and the connection-Laplacian operator for vector fields over 
the manifold. 
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1. Introduction. A popular way to describe the affinities between data points is 
using a weighted graph, whose vertices correspond to the data points, edges that con- 
nect data points with large enough affinities and weights that quantify the affinities. 
In the past decade we have been witnessed to the emergence of non-linear dimension- 
ality reduction methods, such as locally linear embedding (LLE) [33], ISOMAP [39] . 
Hessian LLE [121, Laplacian eigenmaps [2] and diffusion maps [9]. These methods use 
the local affinities in the weighted graph to learn its global features. They provide 
invaluable tools for organizing complex networks and data sets, embedding them in a 
low dimensional space, and studying and regressing functions over graphs. Inspired by 
recent developments in the mathematical theory of cryo-electron microscopy [361 118) 
and synchronization [341 llOj , in this paper we demonstrate that in many applications, 
the representation of the data set can be vastly improved by attaching to every edge 
of the graph not only a weight but also a linear orthogonal transformation (see Figure 

Consider, for example, a data set of images, or small patches extracted from 
images (see, e.g., [171|S]). While weights are usually derived from the pairwise com- 
parison of the images in their original representation, we instead associate the weight 
Wij to the similarity between image i and image j when they are optimally rotation- 
ally aligned. The dissimilarity between images when they are optimally rotationally 
aligned is sometimes called the rotationally invariant distance [3T] . We further define 
the linear transformation Oij as the 2x2 orthogonal transformation that registers the 
two images (see Figure 1.2). Similarly, for data sets consisting of three-dimensional 
shapes, Oij encodes the optimal 3x3 orthogonal registration transformation. In the 
case of manifold learning, the linear transformations can be constructed using local 
principal component analysis (PCA) and alignment, as discussed in Section [2] 

While diffusion maps and other non-linear dimensionality reduction methods are 
either directly or indirectly related to the heat kernel for functions over the data, 



*Department of Mathematics and PACM, Princeton University, Fine Hall, Washington Road, 
Princeton NJ 08544-1000 USA, email; amits@math.princeton.edu 

t Department of Mathematics, Princeton University, Fine Hall, Washington Road, Princeton NJ 
08544-1000 USA, email: hauwu@math.princeton.edu 



1 



Fig. 1.1. In VDM, the relationships between data points are represented as a weighted graph, 
where the weights Wij are accompanied by linear orthogonal transformations Oij . 






(a) li 



(b) Ij 



(c) 4 



Fig. 1.2. An example of a weighted graph with orthogonal transformations: li and Ij are two 
different images of the digit one, corresponding to nodes i and j in the graph. Oij is the 2x2 
rotation matrix that rotationally aligns Ij with li and Wij is some measure for the affinity between 
the two images when they are optimally aligned. The affinity Wij is large, because the images li and 
Oijij are actually the same. On the other hand, Ii^ is an image of the digit two, and the discrepancy 
between 4 and li is large even when these images are optimally aligned. As a result, the affinity 
Wii^ would be small, perhaps so .small that there is no edge in the graph connecting nodes i and k. 
The matrix Ojfc is clearly not as meaningful as Oij . If there is no edge between i and k, then Oi^ 
is not represented in the weighted graph. 



our VDM framework is based on the heat kernel for vector fields. We construct this 
kernel from the weighted graph and the orthogonal transformations. Through the 
spectral decomposition of this kernel, VDM defines an embedding of the data in a 
Hilbert space. In particular, it defines a metric for the data, that is, distances between 
data points that we call vector diffusion distances. For some applications, the vector 
diffusion metric is more meaningful than currently used metrics, since it takes into 
account the linear transformations, and as a result, it provides a better organization of 
the data. In the manifold learning setup, we prove a convergence theorem illuminating 
the relation between VDM and the connection-Laplacian operator for vector fields over 
the manifold. 



The paper is organized in the following way: In Section[2]we describe the manifold 
learning setup and a procedure to extract the orthogonal transformations from a point 
cloud scattered in a high dimensional Euclidean space using local PCA and alignment. 
In Section [3] we specify the vector diffusion mapping of the data set into a finite di- 
mensional Hilbert space. At the heart of the vector diffusion mapping construction 
lies a certain symmetric matrix that can be normalized in slightly different ways. Dif- 
ferent normalizations lead to different embcddings, as discussed in Section [4j These 
normalizations resemble the normalizations of the graph Laplacian in spectral graph 
theory and spectral clustering algorithms. In the manifold learning setup, it is known 
that when the point cloud is uniformly sampled from a low dimensional Riemannian 
manifold, then the normalized graph Laplacian approximates the Laplace-Beltrami 
operator for scalar functions. In Section [5] we formulate a similar result, stated as 
Theorem |5.1[ for the convergence of the appropriately normalized vector diffusion 
mapping matrix to the connection-Laplacian operator for vector fields The proof of 
Theorem 5.1 appears in Appendix [B] We verified Theorem |5 . 1 1 numerically for spheres 
of different dimensions, as reported in Section [6] and Appendix [C] We also used other 
surfaces to perform numerical comparisons between the vector diffusion distance, the 
diffusion distance, and the geodesic distance. In Section [7] we briefly discuss out-of- 
sample extrapolation of vector fields via the Nystrom extension scheme. The role 
played by the heat kernel of the connection-Laplacian is discussed in Section [Sj We 
use the well known short time asymptotic expansion of the heat kernel to show the re- 
lationship between vector diffusion distances and geodesic distances for nearby points. 
In Section |9] we briefly discuss the application of VDM to cryo-electron microscopy, as 
a prototypical multi-reference rotational alignment problem. We conclude in Section 



10 with a summary followed by a discussion of some other possible applications and 



extensions of the mathematical framework. 



2. Data sampled from a Riemannian manifold. One of the main objectives 
in the analysis of a high dimensional large data set is to learn its geometric and 
topological structure. Even though the data itself is parameterized as a point cloud 
in a high dimensional ambient space W, the correlation between parameters often 
suggests the popular "manifold assumption" that the data points are distributed on 
(or near) a low dimensional Riemannian manifold Ai'^ embedded in M.P, where d is the 
dimension of the manifold and d <^ p. Suppose that the point cloud consists of n data 
points a;i, a;2, . . . , a;„ that are viewed as points in W but are restricted to the manifold. 
We now describe how the orthogonal transformations Oij can be constructed from the 
point cloud using local PCA and alignment. 

Local PCA. For every data point Xi we suggest to estimate a basis to the tangent 
plane T^^M to the manifold at Xi using the following procedure which we refer to as 
local PCA. We fix a scale parameter epcA > and define J^xi,<LpcA ^ the neighbors 
of Xi inside a ball of radius y/epcA centered at xf. 



J^x,,tpcA = {Xj ■ < ll^^i - X^Wrp < ^Epca}- 



^One of the main considerations in the way this paper is presented was to make it as accessible 
as possible, also to readers who are not familiar with differential geometry. Although the connection- 
Laplacian is essential to the understanding of the mathematical framework that underlies VDM, 
and differential geometry is extensively used in Appendi x [B] fo r the proof of Theorem |5.1[ we do 
not assume knowledge of differential geometry in Sections |2|10| (except for some parts of Section [sJ 
that detail the algorithmic framework. The concepts of differential geometry that are required for 
achieving basic familiarity with the connection-Laplacian are explained in Appendix [A| 
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Denote the number of neighboring points of Xi h^^Ni, that is, Ni ~ l-^xi.epcAlt ^^'^ 
denote the neighbors of Xi by Xi-^^Xi^, ■ ■ . , x^^, . We assume that epcA is large enough 
so that Nj > d, but at the same time epcA is small enough such that Ni <C n. In 
Theorem ] 



B.l 



we show that a satisfactory choice for epcA is given by epcA = 0{n 
so that Ni — 0{n^). In fact, it is even possible to choose epcA — 0{n^^) if the 
manifold does not have a boundary. 

Observe that the neighboring points are located near T^.M, where deviations are 
possible either due to curvature or due to neighboring data points that lie slightly 
off the manifold. Define to be a p x Ni matrix whose j'th column is the vector 
Xij 2^2, that is, 

Ni — '^ii '^12 ■ ■ ■ '^^N- ] ■ 

In other words, Xi is the data matrix of the neighbors shifted to be centered at the 
point Xi. Notice, that while it is more common to shift the data for PCA by the mean 
l^i ~ W '^j=i ' ^^^^ shift the data by Xi. Shifting the data by /x^ is also possible 
for all practical purposes, but has the slight disadvantage of complicating the proof 



for the convergence of the local PCA step (see Appendix B.l I. 

Let Khe&C^ positive monotonic decreasing function with support on the interval 
[0,1], for example, the Epanechnikov kernel K{u) = (1 — v?)x[o,l\^ where x is the 
indicator function |^ Let Di be an iV, x Ni diagonal matrix with 



A(J,J) = Ji^('^^^=^), j = l,2,...,iV,. 



Define the p x Ni matrix Bi as 



Bi — NiDi 



That is, the j'th column of Bi is the vector (xi - —Xi) scaled by Di{j,j). The purpose 
of the scaling is to give more emphasis to nearby points over far away points (recall 
that K is monotonic decreasing). We denote the singular values of Bi by ai^i > (7i^2 > 

In many cases, the intrinsic dimension d is not known in advance and needs to 
be estimated directly from the data. If the neighboring points in J^xi,epcA ^-re located 
exactly on T^-Ad, then rankX^ = rankSj = d, and there are only d non- vanishing 
singular values (i.e., ai^d+i = ■ ■ ■ = (Ji.Ni = 0). In such a case, the dimension can 
be estimated as the number of non-zero singular values. In practice, however, due to 
the curvature effect, there may be more than d non-zero singular values. A common 
practice is to estimate the dimension as the number of singular values that account 
for high enough percentage of the variability of the data. That is, one sets a threshold 
7 between and 1 (usually closer to 1 than to 0), and estimates the dimension as the 
smallest integer di for which 



X^Ni 2 



> 7- 



Since Ni depends on epcA, it should be denoted as Ni^^p^^, but since epcA is kept fixed it 
is suppressed from the notation, a convention that we use except for cases in which confusion may 



arise 
3 



In fact, K can be chosen in a more general fashion, for example, monotonicity is not required for 
all theoretical purposes. However, in practice, a monotonic decreasing K leads to a better behavior 
of the PCA step. 
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For example, setting 7 ~ 0.9 means that di singular values account for at least 90% 
variability of the data, while di — 1 singular values account for less than 90%. We 
refer to the smallest integer di as the estimated local dimension of A4 at Xi. One 
possible way to estimate the dimension of the manifold would be to use the mean 
of the estimated local dimensions di, . . . , d„, that is, d = X]"=i '^i (and then round 
it to the closest integer). The mean estimator minimizes the sum of squared errors 
Y^7=ii'^i ~ ^)^- estimate the intrinsic dimension of the manifold by the median 
value of all the d^'s, that is, we define the estimator d for the intrinsic dimension d as 

d = median{c?i, ^2, • ■ • , d„}. 

The median has the property that it minimizes the sum of absolute errors X^ILi 
As such, estimating the intrinsic dimension by the median is more robust to outliers 
compared to the mean estimator. In all proceeding steps of the algorithm we use the 
median estimator d, but in order to facilitate the notation we write d instead of d. 
Suppose that the singular value decomposition (SVD) of Bi is given by 

The columns of the p x Ni matrix Ui are orthonormal and are known as the left 
singular vectors 

Ui — [ Uii ■ ' ' ] ■ 

We define the p x d matrix Oi by the first d left singular vectors (corresponding to 
the largest singular values): 

Oi = [ Ui^ Ui^ ■■ ■ Ui^ ] . (2.1) 

The d columns of Oi are orthonormal, i.e., OfOi = Idxd- The columns of Oi represent 
an orthonormal basis to a d-dimensional subspace of . This basis is a numerical 
approximation to an orthonormal basis of the tangent plane T^-M. The order of the 
approximation (as a function of epcA a-nd n) is established later, using the fact that 
the columns of Oi are also the eigenvectors (corresponding to the d largest eigenvalues) 
of the p X p covariance matrix given by 

= (^fcg^) (x,^ - x,)(x,^. - x,f. (2.2) 

Since K is supported on the interval [0, 1] the covariance matrix Sj can also be rep- 
resented as 

E^^pK (^fcM^) {x, - x,){x, - x^f. (2.3) 

We emphasize that the covariance matrix is never actually formed due to its excessive 
storage requirements, and all computations are performed with the matrix Bi. We 
remark that it is also possible to estimate the intrinsic dimension d and the basis Oi 
using the multiscaled PCA algorithm J8J that uses several different values of epcA 
for a given Xi , but here we try to make our approach as simple as possible while being 
able to later prove convergence theorems. 
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Alignment. Suppose Xi and xj are two nearby points whose Euclidean distance 
satisfies lla;^ — Xj Ukp < where e > is a scale parameter different from the scale 
parameter epcA- In fact, e is much larger than epcA as we later choose e = 0{n^'^), 
while, as mentioned earlier, epcA = 0(n~3+T) (manifolds with boundary) or epcA = 
0(ri~3+2) (manifolds with no boundary). In any case, e is small enough so that the 
tangent spaces T^^M. and T^.M are also close|^ Therefore, the column spaces of Oi 
and Oj are almost the same. If the subspaces were to be exactly the same, then 
the matrices Oi and Oj would have differ by a d x orthogonal transformation Oij 
satisfying OiOij = Oj, or equivalently Oij = OfOj. In that case, OjOj is the matrix 
representation of the operator that transport vectors from to T^^A^, viewed 

as copies of IR''. The subspaces, however, are usually not exactly the same, due to 
curvature. As a result, the matrix Oj Oj is not necessarily orthogonal, and we define 
Oij as its closest orthogonal matrix, i.e., 

= argmin \\0 - OJO,\\hs, (2.4) 

where || • \\hs is the Hilbert-Schmidt norm (given by = Ty{AA^) for any real 

matrix A) and 0{d) is the set of orthogonal dx d matrices. This minimization problem 
has a simple solution [H US (23 H] via the SVD of OjO,. Specifically, if 

OjOj = UY.V'^ 

is the SVD of OfOj, then Oij is given by 

O,, = UV^. 

We refer to the process of finding the optimal orthogonal transformation between 
bases as alignment. Later we show that the matrix Oij is an approximation to the 
parallel transport operator (see Appendix [A|) from T^jAi to T^.Ai whenever Xi and 
Xj are nearby. 

Note that not all bases are aligned; only the bases of nearby points are aligned. 
We set E to be the edge set of the undirected graph over n vertices that correspond 
to the data points, where an edge between i and j exists iff their corresponding bases 
are aligned by the algorithirj^ (or equivalently, iff < ||a;i — Xj Urp < y/e). The weights 
Wij are defined using a kernel function if as 

w.,=k( ^^^^"^/^A , (2.5) 



where we assume that K is supported on the interval [0, 1]. For example, the Gaussian 

kernel K{u) = exp{—u'^}x[o^i] leads to weights of the form Wij = exp{— ^^^'""^^^^ } for 
< \\xi ~ XjW < ^ye and otherwise. We emphasize that the kernel K used for the 
definition of the weights Wij could be different than the kernel used for the previous 
step of local PGA. 



*In the sense that their Grassmannian distance given approximately by the operator norm 
\\OiOf - OjOjW is small. 

^We do not align a basis with itself, so the edge set E does not contain self loops of the form 

^Notice that the weights are only a function of the Euclidean distance between data points; 
another possibility, which we do not consider in this paper, is to include the Grassmannian distance 
\\OiOf — OjOj\\2 into the definition of the weight. 



Fig. 2.1. The orthonurrnal basis of the tangent plane Tx^M is determined by local PCA using 
data points inside a Euclidean ball of radius ^/epcX centered at Xi. The bases for Tx^M and Tx^Ai 
are optimally aligned by an orthogonal transformation Oij that can be viewed as a mapping from 
Tx^M toTx.M. 

3. Vector diffusion mapping. We construct the following matrix S: 

That is, S' is a block matrix, with n x n blocks, each of which is of size d x d. Each 
block is either a dx d orthogonal transformation Oij multiplied by the scalar weight 
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Wij, or a zero dxd matrix|^ The matrix S is symmetric since Ofj — Oji and Wij — Wji, 
and its overall size is nd x nd. We define a diagonal matrix D of the same size, where 
the diagonal blocks are scalar matrices given by 

D{i,i)^deg{t)Idxd, (3.2) 

and 

deg(i) = ^ w^j (3.3) 

is the weighted degree of node i. The matrix D^^S can be applied to vectors v of 
length nd, which we regard as n vectors of length d, such that v{i) is a vector in 
viewed as a vector in T^^M. The matrix D^^S is an averaging operator for vector 
fields, since 



This implies that the operator D transport vectors from the tangent spaces A4 
(that are nearby to T^-Ai) to Tx^A4 and then averages the transported vectors in 

Notice that diffusion maps and other non-linear dimensionality reduction methods 
make use of the weight matrix W = (wij)^ j^i, but not of the transformations Oij. In 
diffusion maps, the weights are used to define a discrete random walk over the graph, 
where the transition probability in a single time step from node i to node j is 
given by 



(3.5) 



deg(i) 

The Markov transition matrix A — {aij)2j^i can be written as 

A = V-^W, (3.6) 

where V is n x n diagonal matrix with 

2?(z,z) =deg(i). (3.7) 

While A is the Markov transition probability matrix in a single time step, A' is the 
transition matrix for t steps. In particular, A^{i,j) sums the probabilities of all paths 
of length t that start at i and end at j. Coifman and Lafon [51 US] showed that A* can 
be used to define an inner product in a Hilbert space. Specifically, the matrix A is simi- 
lar to the symmetric matrix V-^/^WV"^/'^ through A = V-^I'^{J)-^I'^WV-^I'^)V^I'^ . 
It follows that A has a complete set of real eigenvalues and eigenvectors {[i.iYi=\ ^^'^ 
{'/'illLi: respectively, satisfying A(^i = Their diffusion mapping $t is given by 

<^t(i) = (Ai^0l(^),M202(^),...,MUn(^)), (3.8) 



^As mentioned in the previous footnote, the edge set does not contain self-loops, so wa = and 
S{i,i) = Odxd- 
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where (i) is the i'th entry of the eigenvector (f)i . The mapping $( satisfies 



y 4^ ^'^^''^^ = {Mi), (3.9) 

where (•, •) is the usual dot product over Euclidean space. The metric associated to 
this inner product is known as the diffusion distance. The diff^usion distance doM.f (*, j) 
between i and j is given by 



^{A\i, k)-A\ j,k)r 

deg(fc) 

(3.10) 

Thus, the diffusion distance between i and j is the weighted-^2 proximity between the 
probability clouds of random walkers starting at i and j after t steps. 

In the VDM framework, we define the affinity between i and j by considering 
all paths of length t connecting them, but instead of just summing the weights of all 
paths, we sum the transformations. A path of length t from j to i is some sequence 
of vertices jq , ji, . . . , jt with Jq — j and ji = i and its corresponding orthogonal 
transformation is obtained by multiplying the orthogonal transformations along the 
path in the following order: 

Onj,.r---Oj,j,Oj,j,. (3.11) 

Every path from j to i may therefore result in a different transformation. This is 
analogous to the parallel transport operator from differential geometry that depends 
on the path connecting two points whenever the manifold has curvature (e.g., the 
sphere). Thus, when adding transformations of different paths, cancelations may 
happen. We would like to define the aflSnity between i and j as the consistency 
between these transformations, with higher affinity expressing more agreement among 
the transformations that are being averaged. To quantify this affinity, we consider 
again the matrix D~^S which is similar to the symmetric matrix 

S = £)-V25£)-i/2 (3.12) 

through D-^S = D^^/'^SD^/'^ and define the affinity between i and j as ||^^*(i, j)||l/s, 
that is, as the squared HS norm of the dxd matrix S'^*{i,j), which takes into account 
all paths of length 2t, where t is a positive integer. In a sense, \\j{i,j)\\'Hs measures 
not only the number of paths of length 2t connecting i and j but also the amount 
of agreement between their transformations. That is, for a fixed number of paths, 
\\^^*{hj)\\Hs larger when the path transformations are in agreement, and is smaller 
when they differ. 

Since S is symmetric, it has a complete set of eigenvectors Vi,V2, ... ,Vnd. and 
eigenvalues Ai, A2, . . . , Xnd- We order the eigenvalues in decreasing order of magnitude 
|Ai| > IA2I > . . . > |A„d|. The spectral decompositions of S and 5^* are given by 

nd nd 

S{i,j)=y\ivi{{)vt{jf, and S^\i,j) = J2x^\{i)vi{jf, (3.13) 

1=1 1=1 



where vi{i) € ior i = 1, . . . ,n and / = l,...,nd. The HS norm of S'^*{i,j) is 
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calculated using the trace: 



nd 



(3.14) 

It follows that the affinity ||S'^*(i, j)|||^5 is an inner product for the finite dimensional 
Hilbert space M^'"*'^ via the mapping Vt: 



That is, 



\\s''{^,j)rHs^m),vt{j)). 



(3.15) 



(3.16) 



Note that in the manifold learning setup, the embedding i i~> Vt{i) is invariant to 
the choice of basis for T^^M because the dot products {vi{i),Vr{i)) are invariant to 
orthogonal transformations. We refer to Vt as the vector diffusion mapping. 

From the symmetry of the dot products {vi{i),Vr{i)) — {vr{i),vi{i)) , it is clear 
that ||S'^*(i, j)|||f5 is also an inner product for the finite dimensional Hilbert space 
]grid(nd+i)/2 corresponding to the mapping 



{cir{XlXrY{vi{i),Vr{i)))^ 



<l<r<r. 



where 



Clr 



V2 

1 



/ < r, 
I = r. 



We define the symmetric vector diffusion distance c?vDM.t(*, j) between nodes i and j 



d^uuAhj) = {Vt{T^,Vt{i)) + {VtU),Vt(J)) - 2{Vt{i),Vt{j)). (3.17) 
The matrices I^S and I+S are positive semidefinite due to the following identity: 



\l±D-'^^SD-'/^)v= 

(''j)6S 



v{i) ^ Wi-jOijv{j) 



Vdcg(i) Vdeg(j) 



> 0, 



(3.18) 



for any v £ M"''. As a consequence, all eigenvalues A/ of S reside in the in terval 
[—1,1]. In particular, for large enough t, most terms of the form (AjA^)^* in (3.141 
are close to 0, and |jS'^*(i, j')|||^g can be well approximated by using only the few 
largest eigenvalues and their corresponding eigenvectors. This lends itself into an 
efficient approximation of the vector diffusion distances dyDM.tih j) of (3.17), and it 
is not necessary to raise the matrix S to its 2t power (which usually results in dense 
matrices). Thus, for any (5 > 0, we define the truncated vector diffusion mapping 
that embeds the data set in M™ (or equivalently, but more efficiently in ]R'"(™+i)/2) 
using the eigenvectors vi, . . . ,Vm as 



Vt' ■.^^{iX,Xrnvli^),Vri^)))'^^^, 
10 



(3.19) 



\ \ 2t / > \ 2t 



where m = 'm{t,S) is the largest integer for which ^i^J > ^^"^ \^ — / ~ ^' 

We remark that we define Vt through ||S'^*(«, rather than through |iS'*(z, j)|||^5, 

because we cannot guarantee that in general all eigenvalues of S are non-negative. In 
Section [8j we show that in the continuous setup of the manifold learning problem 
all eigenvalues are non-negative. We anticipate that for most practical applications 
that correspond to the manifold assumption, all negative eigenvalues (if any) would 
be small in magnitude (say, smaller than S). In such cases, one can use any real t > 
for the truncated vector diffusion map . 

4. Normalized Vector Diffusion Mappings. It is also possible to obtain 
slightly different vector diffusion mappings using different normalizations of the matrix 
5". These normalizations are similar to the ones used in the diffusion map framework 

For example, notice that 

wi = D-^/^vi (4.1) 

are the right eigenvectors of D^^S, that is, D^^Swi = Xiwi. We can thus define 
another vector diffusion mapping, denoted V/, as 



From (4.1 1 it follows that V/ and Vt satisfy the relations 



and 



As a result, 



= ^^Vti^), (4.3) 



deg(z)deg(j) - deg(jr " ^ ^^^ 

In other words, the Hilbert-Schmidt norm of the matrix D~^S leads to an embedding 
of the data set in a Hilbert space only upon proper normalization by the vertex degrees 
(similar to the normalization by the vertex degrees in (3.91 and (3.10) for the diffusion 
map). We define the associated vector diffusion distances as 

dl^M'Ahj) = {Vl{i)X{i)) + {V;{j)Xi3)) - 2{V;{^)Xm■ (4.6) 



The distances are related by rfvDM',t(«. .?') = dc'g'(»"deg(j) • 

We comment that the normalized mappings i 1— > and i ||^',|^||| that map 

the data points to the unit sphere are equivalent, that is, 

K(0 Vt{^ 



\vim \mi)\\ 



(4.7) 



This means that the angles between pairs of embedded points are the same for both 
mappings. For diffusion map, it has been observed that in some cases the distances 
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II - II II are more meaningful than ||'I>t(i) - $tO')|| (see, for example, [T7]). 

This may also suggest the usage of the distances || yj^l^jn — ptrW^^ VDM 
framework. 

Another important family of normalized diffusion mappings is obtained by the 
following procedure. Suppose < a < 1, and define the symmetric matrices Wa and 
5*0- as 

Wa ^ V-°'WV-°' , (4.8) 

and 

= D-^SD-". (4.9) 
We define the weighted degrees degQ,(l), . . . , deg„(n) corresponding to Wa by 

n 

deg„(i) = '^Wa{i,j), 
i=i 

the n X n diagonal matrix as 

Va{i,i)=dega{i), (4.10) 

and the n x n block diagonal matrix (with blocks of size d x d) as 

Daii,i) ^ deg^{i)Idxd- (4.11) 

We can then use the matrices Sa and Da (instead of S and D) to define the vector 
diffusion mappings Va.t and Va f Notice that for a = we have Sq = S and Dq = Z?, 
so that Vo,t = and t = ^t- The case a = 1 turns out to be especially important 
as discussed in the next Section. 

5. Convergence to the connection-Laplacian. For diffusion maps, the dis- 
crete random walk over the data points converges to a continuous diffusion process 
over that manifold in the limit n — )■ oo and e — > 0. This convergence can be stated in 
terms of the normalized graph Laplacian L given by 

L = V-^W - I. 

In the case where the data points {xiY^^^ are sampled independently from the uni- 
form distribution over M.''', the graph Laplacian converges pointwise to the Laplace- 
Beltrami operator, as we have the following proposition [261 [3l [35l [20] : If / • -^'^ ~^ ^ 
is a smooth function (e.g., / £ C^{A4)), then with high probability 

1 ^ 1 / 1 \ 

-5:i.,/(a:,) = ^^Mnx,) + 0(e+-^j^-^j^\ , (5.1) 

j=i \ / 

where Am is the Laplace-Beltrami operator on Ai^. The error consists of two terms: 
a bias term 0{e) and a variance term that decreases as l/y'n, but also depends on 
e. Balancing the two terms may lead to an optimal choice of the parameter e as a 
function of the number of points n. In the case of uniform sampling, Belkin and 
Niyogi [4 have shown that the eigenvectors of the graph Laplacian converge to the 
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eigenfunctions of the Laplace-Beltrami operator on the manifold, which is stronger 
than the pointwise convergence given in ( |5.1[ ). 

In the case where the data points {xi}^^-^ are independently sampled from a 
probability density function p{x) whose support is a cZ-dimensional manifold Ai'^ and 
satisfies some mild conditions, the graph Laplacian converges pointwise to the Fokker- 
Planck operator as stated in following proposition [26l [3l [35l [20] : If / G C^{Ai), then 
with high probability 

1^1 / 1 \ 

- J2 I^^^Ii^,) = 2 A^,/(x,) + VC/(x,) • Vf{x,) + O f e + , (5.2) 

where the potential term U is given by U{x) — — 21ogp(x). The error is interpreted in 
the same way as in the uniform sampling case. In [S] it is shown that it is possible to 
recover the Laplace-Beltrami operator also for non-u niform sampling processes using 



Wi and Vi (that correspond to a = 1 in (4.8) and (4.11 1). The matrix V^^Wi - I 



converges to the Laplace-Beltrami operator independently of the sampling density 
function p{x). 

For VDM, we prove in Appendix |B] the following theorem, Theorem |5.1[ that 
states that the matrix D~^Sa — I, where < a < 1, converges to the connection- 
Laplacian operator (defined via the covariant derivative, see Appendix [A| and |32) ^ 
plus some potential terms depending on p{x). In particular, D^^Si — I converges to 
the connection-Laplacian operator, without any additional potential terms. Using the 
terminology of spectral graph theory, it may thus be appropriate to call D^^Si — I 
the connection-Laplacian of the graph. 

The main content of Theorem |5.1| specifies the way in which VDM generalizes 
diffusion maps: while diffusion mapping is based on the heat kernel and the Laplace- 
Beltrami operator over scalar functions, VDM is based on the heat kernel and the 
connection-Laplacian over vector fields. While for diffusion maps, the computed eigen- 
vectors are discrete approximations of the Laplacian eigenfunctions, for VDM, the l-th 
eigenvector vi of D^^Si — / is a discrete approximation of the l-th eigen- vector field 
Xi of the connection-Laplacian over M, which satisfies V^X; = —XiXi for some 
A; > 0. 

In the formulation of the Theorem |5.1[ as well as in the remainder of the paper, 
we slightly change the notation used so far in the paper, as we denote the observed 
data points in Rp by i(xi), t(x2), . . . , t(a;„), where t : ^ M^" is the embedding 
of the Riemannian manifold A4 in W. Furthermore, we denote by t^Tj^.A^ the d- 
dimensional subspace of MP which is the embedding of T^^M in W. It is important 
to note that in the manifold learning setup, the manifold A4, the embedding l and 
the points Xi, X2, ■ ■ ■ , Xn & M are assumed to exist but cannot be directly observed. 

Theorem 5.1. Let l : A4 be a smooth d-dim closed Riemannian manifold 

embedded in MP , with metric g induced from the canonical metric on M^ . Let K € 
C2([0, 1)) be a positive function. For e > 0, let K, ixi,Xj) = K |^MM:^iii)JkiL^ for 

< ||i(a;i)— i(a;j)|| < ^/e, andK^ {xi,Xj) — otherwise. Let the data set {xi}i^i^,,,^n be 
independently distributed according to the probability density function p{x) supported 
on M, where p is uniformly bounded from below and above, that is, < p„i < Pix) < 
Pm < oo. Define the estimated probability density distribution by 

n 

Pe{Xi) = {Xi,Xj) 
J = l 
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and for < o; < 1 define the a-normalized kernel K^^a by 



Then, using epcA = 0{n^'^) for X G C^{TAA) and for all Xi with high probability 
(w.h.p.) 



TO2 

2dmQ 



<^ V^X{xi)+d 



Jg,_,WgX{x,)Ve{p'-''){x,)de 



+0 + e-\-^ + ^-i/2g-W4+i/2)^ 



m2 
2dmQ 



LAv^X{x,)+d 



Jg,_,WgX{x,)Ve{p'-''){x,)de 



,ei{xi) 



(5.3) 



1=1 



P^-°'{Xi) 

+0 + + ^-i/2g-(d/4+i/2)^ 

where is </ie connection- Laplacian, Xi = {{i*X{xi),ui{xi)))f^i S M"^ for all i, 
{ui(xi)}i=i^,,,^d is an orthonormal basis for a d- dimensional subspace ofW determined 
by local PCA (i.e., the columns of Oi), {ei{xi)}i^i^,,,^d is an orthonormal basis for 
b^T^.M., rui = Jjgi |jx||'i4r(||a;||)dx, and Oij is the optimal orthogonal transformation 
determined by the alignment procedure. In particular, when a = 1 we have 



E"=i-^e,i {x^,xj)0,jXj 

{Xz,X.j) 



X, 



2dmQ 



{{L,W^X{x,),ei{x,)))'l^^ (5.4) 



+0 + e-'n-^ + ^-i/2e-W4+i/2)^ 

Furthermore, for e — 0{n^'^), almost surely, 

^^j — l I^€,a {Xi, Xj') OijXj 



lim - 

n— >-C30 e 



7712 

2dmo 



X/ i = l -^e.ct {Xi, Xj) 



X, 



(5.5) 



L., \ V^X{x,)+d 



Js,-,VeX{x,)Ve{p^-''){x^)d9 \ 
pi-"(a;,) J 



eiixi) 



1=1 



and 



lim - 

n— ^oo e 



EjLl^e.l iXi,Xj)OijXj 
YTj = lKe,l {x^,Xj) 



x. 



7712 

2dmr 



{{L,W^X{x,),ei{x,)))^^^. (5.6) 



When epcA — 0(n^'^) then the same almost surely convergence results above hold 
but with a slower convergence rate. 

When the manifold is compact with boundary, (5.4) does not hold at the bound- 
ary. However, we have the following result for the convergence behavior near the 
boundary: 
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Theorem 5.2. Let l : Ai ^ W he a smooth d-dim compact Riemannian manifold 
with smooth boundary dA4 embedded in MP, with metric g induced from the canonical 
metric on Rp . Let {x,}i=i,,„^„, p{x), K^,i {xj x-,), p^Xi), {ei{xt)}i=i^,„^d and Xi be 



defined in the same way as in Theorem 



5.1 



Choose epcA — 0{n 'f+i). Denote 



M.^ — \x ^ M. : miiiy^gj^ d{x,y) < y/e}, where d{x,y) is the geodesic distance 
between x and y. 



When Xi € M.^? we have 



T,"=lKe.,l {x^,Xj)0^jXj 



t*Px,.xo [X{xo) + ^Wg,X{xQ)] ,ei{x,) 



+0(^e + n-^ + „-i/2,-(d/4-i/2)^ 



(5.7) 



where Xq — argmirij^gg^ d{xi,y), Pxi,xo is the parallel transport from xq to Xi along 
the geodesic linking them, m\ and rriQ are constants defined in (B.99) and (B.lOO), 
and dd is the normal direction to the boundary at xq. 

For the choice e = 0{n~^+^) (as in Theorem 5.1), the error appearing in (5.71 



is 0{e^^^) which is asymptotically smaller than 0{^/e), which is the order of 



A consequence of Theorem 5.1 Theorem 5.2 and the above discussion about the 
error terms is that the eigenvectors of D^^Si — I are discrete approximations of the 
eigen- vector-fields of the connection-Laplacian operator with homogeneous Neumann 
boundary condition that satisfy 



V^X{x) = -XX{x), 
Va.Xix) = 0, 



for X G M, 
for X e dM. 



(5.8) 



We remark that the Neumann boundary condition also emerges for the choice epcA = 
0(ri~3+2 ). This is due to the fact that the error in the local PCA term is 0(ep(?^) = 
0(n^3+5), which is asymptotically smaller than 0(e^/^) — 0(n^3+4) error term. 



Finally, Theorem 5.3 details the way in which the algorithm approximates the 
continuous heat kernel of the connection-Laplacian: 

Theorem 5.3. Let l. : M ^ W be a smooth d-dim compact Riemannian manifold 
embedded in M.^, with metric g induced from the canonical metric on and be 
the connection Laplacian. For < a < 1, define 



T,,c.X{x) = 



Jj^K,Ax,y)P^,yXiy)dV{y) 



Jj^K,A^,y)dV{y) 



where Px,y '■ TyM — > T^M is the parallel transport operator from y to x along the 
geodesic connecting them. 

Then, for any t > 0, the heat kernel e*^ can be approximated on L^iTM) (the 

space of squared-integrable vector fields) by T^'i, that is. 



lim T/i 



„tV^ 



6. Numerical simulations. In all numerical experiments reported in this Sec- 
tion, we use the normalized vector diffusion mapping V( ^ corresponding to a = 1 in 



(4.9) and (4.10), that is, we use the eigenvectors of ^Si to define the VDM. In all 

X[o.i] foi' the local PCA step 



experiments we used the kernel function K(u) 
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as well as for the definition of the weights Wij. The specific choices for e and epcA 
are detailed below. We remark that the results are not very sensitive to these choices, 
that is, similar results are obtained for a wide regime of parameters. 

The purpose of the first experiment is to numerically verify Theorem |5.1| using 
spheres of different dimensions. Specifically, we sampled n — 8000 points uniformly 
from S'^ embedded in for d = 2,3, 4, 5. Figure 6.1 shows bar plots of the largest 
30 eigenvalues of the matrix D^^Si for epcA — 0.1 when d — 2,3,4 and epcA — 0.2 

d + l 

when d = 5, and e — epc\. It is noticeable that the eigenvalues have numerical mul- 
tiplicities greater than 1. Since the connection-Laplacian commutes with rotations, 
the dimensions of its eigenspaces can be calculated using representation theory (see 
AppendixjC]). In particular, our calculation predicted the following dimensions for the 
eigenspaces of the largest eigenvalues: 



: 6, 10, 14, . 



: 4,6,9,16,16,, 



S"' : 5,10,14,.... S*^: 6, 15, 20, 



These dimensions are in full agreement with the bar plots shown in Figure |6.1 



I 



I 



20 25 30 




(a) 



(b) 53 





(d) S5 



Fig. 6.1. Bar plots of the largest 30 eigenvalues of ^ S\ for n 
distributed over spheres of different dimensions. 



8000 points uniformly 



In the second set of experiments, we numerically compare the vector diffusion 
distance, the diffusion distance, and the geodesic distance for different compact man- 
ifolds with and without boundaries. The comparison is performed for the following 
four manifolds: 1) the sphere embedded in M.^; 2) the torus embedded in M^; 
3) the interval [-tt,tt] in M; and 4) the square [0, 27r] x [0, 2t t\ in M? . For both VDM 
and DM we truncate the mappings using 5 = 0.2, see (3.19). The geodesic distance 



is computed by the algorithm of Dijkstra on a weighted graph, whose vertices corre- 
spond to the data points, the edges link data points whose Euclidean distance is less 
than y^, and the weights WQ{i,j) are the Euclidean distances, that is. 



wcihj) 



-|-oo 



2;^ — II < y/e, 
otherwise. 



{x G R3 : 11x11 = 1} C 
0.316. For the truncated vector diffusion 



S"^ case: we sampled n = 5000 points uniformly from S'^ 
M'^ and set epcA = 0.1 and e = ^epcA 
distance, when t — 10, we find that the number of eigenvectors whose eigenvalue is 
larger (in magnitude) than X{S is myDM — fnyDuit = 10,6 = 0.2) = 16 (recall the 
definition of m{t,6) that appears after (3.19)). The corresponding embedded dimen- 
sion is mvDM('7ivDM+l)/2, which in this case is 16-17/2 = 136. Similarly, fort = 100, 
m-VDM — 6 (embedded dimension is 6 - 7/2 = 21), and when t = 1000, mvDM = 6 
(embedded dimension is again 21). Although the first eigenspace (corresponding the 
largest eigenvalue) of the connection-Laplacian over S*^ is of dimension 6, there are 
small discrepancies between the top 6 numerically computed eigenvalues, due to the 
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finite sampling. This numerical discrepancy is amplified upon raising the eigenval- 
ues to the i'th power, when t is large, e.g., t — 1000. For demonstration purposes, 
we remedy this numerical effect by artificially setting A/ = Ai for I = 2,.. .,6. For 
the truncated diffusion distance, when t = 10, todm = 36 (embedded dimension is 
36 — 1 = 35), when t = 100, todm = 4 (embedded dimension is 3), and when t — 1000, 
TTT-UM — 4 (embedded dimension is 3). Similarly, we have the same numerical effect 
when t = 1000, that is, /i2, Ma ^-iid ^re close but not exactly the same, so we again 
set III — /i2 for I = 3,4. The results are shown in Figure [672] 




(d) dDM,t=io (e) rfDM,t=ioo (f) 'iDM,t=iooo (g) Geodesic distance 



Fig. 6.2. S case. Top: truncated vector diffusion distances for t = 10, t = 100 and t = 1000; 
Bottom: truncated diffusion distances for t = 10, t = 100 and t = 1000, and the geodesic distance. 
The reference point from which distances are computed is marked in red. 

case: we sampled n = 5000 points (u, v) uniformly over the square [0, 27r) x 
[0, 27r) and then mapped them to using the following transformation that defines 
the surface as 

^ {((2 + cos(u))cos(u),(2 + cos(w))sin(w),sin(u)) : {u,v) e [0, 27r) x [0,27r)} C 

Notice that the resulting sample points are non-uniformly distributed over . There- 
fore, the usage of Si and Di instead of S and D is important if we want the eigen- 
vectors to approximate the eigen- vector-fields of the connection-Laplacian over T^. 
We used epcA = 0.2 and e = ^/epcA ~ 0.447, and find that for the truncated vector 
diffusion distance, when t = 10, the embedded dimension is 2628, when t — 100, the 
embedded dimension is 36, and when t = 1000, the embedded dimension is 3. For 
the truncated diffusion distance, when t = 10, the embedded dimension is 130, when 
t = 100, the embedded dimension is 14, and when t = 1000, the embedded dimension 
is 2. The results are shown in Figure [63] 

1-dim interval case: we sampled n = 5000 equally spaced grid points from the 
interval [— 7r,7r] C and set epcA = 0.01 and e = Cp'cA ~ 0.158. For the truncated 
vector diffusion distance, when t = 10, the embedded dimension is 120, when t = 100, 
the embedded dimension is 15, and when t = 1000, the embedded dimension is 3. For 
the truncated diffusion distance, when t — 10, the embedded dimension is 36, when 
t = 100, the embedded dimension is 11, and when t = 1000, the embedded dimension 



is 3. The results are shown in Figure 6.4 
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(a) rfvDM',t=10 



(b) rfvDM',t=100 



(c) (ivDM',t=1000 




Bottom: truncated diffusion distances for t = 10, t = 100 and t = 1000, and the geodesic distance. 
The reference point from which distances are computed is marked in red. 




Square case: we sampled n — 6561 — 81^ equally spaced grid points from the 
square [0, 2tt] x [0, 2tt] and fix epcA — 0.01 and e = ^/£pcA = 0.1. For the truncated 
vector diffusion distance, when t = 10, the embedded dimension is 20100 (we only 
calculate the first 200 eigenvalues), when t = 100, the embedded dimension is 1596, 
and when t = 1000, the embedded dimension is 36. For the truncated diffusion 
distance, when t — 10, the embedded dimension is 200 (we only calculate the first 200 
eigenvalues), when t = 100, the embedded dimension is 200, and when t = 1000, the 



embedded dimension is 28. The results are shown in Figure 6.5 



7. Out-of-sample extension of vector fields. Let X — {xi}"^^-^ and y = 

{UiYiLi so that X,y C Ai'^, where A4 is embedded in Rp by t. Suppose X is a smooth 
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(a) rfvDM',t=10 (b) '^VDM',t=100 (c) 'ivDM',t=1000 




(d) dDM,t=io (e) rfDM,t=ioo (f) c'DM,t=iooo (g) Geodesic distance 



Fig. 6.5. Square case. Top: truncated vector diffusion distances for t = 10, t = 100 and 
t = 1000; Bottom: truncated diffusion distances for t = 10, t = 100 and t = 1000, and the geodesic 
distance. The reference point from which distances are computed is marked in red. 

vector field that we observe only on X and want to extend to y. That is, we observe 
the vectors l^X(xi), . . . , L.^,X{xn) G and want to estimate L.^,X{yl), . . . , L^X{yjn)- 
The set X is assumed to be fixed, while the points in y may arrive on-the-fly and need 
to be processed in real time. We propose the following Nystrom scheme for extending 
X from A" to 3^. 

In the preprocessing step we use the points xi, . . . ,x„ for local PCA, alignment 
and vector diffusion mapping as described in Sections [2] and [3j That is, using local 
PCA, we find the p x d matrices Oi {i = l,...,ri), such that the columns of Oi 
are an orthonormal basis for a subspace that approximates the embedded tangent 
plane l^,Tx-A4] using alignment we find the orthonormal d x d matrices Oij that 
approximate the parallel transport operator from T^-Ai to T^-Ad; and using Wij and 
Oij we construct the matrices S and D and compute (a subset of) the eigenvectors 
Ui, W2, . . . , Vnd and eigenvalues Ai, . . . , A„rf of D~^S. 

We project the embedded vector field L^,X{xi) E Rp into the d-dimcnsional sub- 
space spanned by the columns of Oi , and define Xi e M"* as 

X, = Ofi.Xix^). (7.1) 

We represent the vector field X on A" by the vector x of length nd, organized as n 
vectors of length d, with 

x{i) = Xi, for i = 1, . . . ,n. 
We use the orthonormal basis of eigen- vector-fields vi, . . . , Vnd to decompose x as 

nd 

x = ^aivi, (7.2) 
1=1 

where a; = x'^vi. This concludes the preprocessing computations. 

Suppose y E y is a. "new" out-of-sample point. First, we perform the local PCA 
step to find a p x d matrix, denoted Oy, whose columns form an orthonormal basis 
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to a d-dimensional subspace of W that approximates the embedded tangent plane 
L^TyAi. The local PC A step uses only the neighbors of y among the points in X (but 
not in y) inside a ball of radius y^epcA centered at y. 

Next, we use the alignment process to compute the dxd orthonormal matrix Oy^i 
between Xi and y by setting 

Oy^i = argmin \\OyO., ~ 0\hs- 
oeoid) 

Notice that the eigen- vector-fields satisfy 

_ 1 J2^=lKe{\\x^- Xj\\)0^.jVl{j) 

We denote the extension of vi to the point y by vi{y) and define it as 

iE]=iK Ah -xj\\)Oy^Mj) 
My) = T — ^rjT. • 7.3 

To finish the extrapolation problem, we denote the extension of a; to y by x{y) and 
define it as 

m{S) 



xiy) = 2^ aiviiy), (7.4) 
1=1 

where m{6) — max; | A; | > S, and (5 > is some fixed parameter to ensu re th e numerical 



stability of the extension procedure (due to the division by A; in (7.3), g can be 
regarded as the condition number of the extension procedure). The vector L^X(y) S 
W is estimated as 

i,X{y) = Oyx{y). (7.5) 
8. The continuous case: heat kernels. As discussed earlier, in the limit 



oo and e — )■ considered in (5.2), the normalized graph Laplacian converges to 



the Laplace-Beltrami operator, which is the generator of the heat kernel for functions 



(0-forms). Similarly, in the limit n — oo considered in (5.3 1, we get the connection 
Laplacian operator, which is the generator of a heat kernel for vector fields (or 1- 
forms). The connection Laplacian is a self-adjoint, second order elliptic operator 
defined over the tangent bundle TA4. It is well-known [TB] that the spectrum of 
is discrete inside and the only possible accumulation point is — oo. We will denote 
the spectrum as {— A^j^g, where < Aq < Ai.... From the classical elliptic theory, 
see for example [16], we know that e*^ has the kernel 

CO 

kt{x, y)^Yl e"^"*^«(x) «) Xniy). 

71=0 

where V^X„ = ~A„X„. Also, the eigenvector-fields X„ of form an orthonormal 
basis of L?{TM.). In the continuous setup, we define the vector diffusion distance 
between x,y ^ A4 using \\kt{x,y)\\'jfg. An explicit calculation gives 

\\kt{x,y)\\Hs = Ti[kt{x,y)kt{x,yy] 

OO 

= J2 e-(^"+^")*(X„(a;),X^(x))(X„(y),X^(y)). (8.1) 

n,m— 
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It is well known that the heat kernel kt{x,y) is smooth in x and y and analytic in t 
[IB] , so for t > we can define a family of vector diffusion mappings Vt , that map any 
X € Ai into the Hilbert space £^ by: 

Vf.x^ (e-(^"+^'")*/2^X„(x),X„(x)))" , (8.2) 

\ / n,m— 

which satisfies 

\\kt{x,y)\\j,s = {Vt{x),Vt{y))e-- (8-3) 

The vector diffusion distance dvDM.tl^:, 2/) between x G M and y G M is defined as 

rfvDM,t(2:,y) ||Ft(x)-yt(y)||,2, (8.4) 

which is clearly a distance function over A^. In practice, due to the decay of e^'^^"+'^'"''*, 
only pairs {n, m) for which A„ + A„j is not too large are needed to get a good approx- 
imation of this vector diffusion distance. Like in the discrete case, the dot products 
{Xn{x), Xm{x)) are invariant to the choice of basis for the tangent space at x. 



We now study some properties of the vector diffusion map Vt (8.2). First, we 
claim for all t > Q, the vector diffusion mapping Vt is an embedding of the compact 
Riemannian manifold M into l"^ . 

Theorem 8.1. Given a d-dim closed Riemannian manifold {Ai,g) and an 
orthonormal basis {^nj^o '^f L^(TAi) composed of the eigen-vector- fields of the 
connection- Laplacian V^, then for any t > 0, the vector diffusion map Vt is a diffeo- 
morphic embedding of M into P' . 

Proof. We show that Vt : — > is continuous in x by noting that 

oo 

\\Vt{x) - Vtml = E e-(^"+^-)*((X„(a:),X™(x)) - {X^{y),X„,{y))f 

71,171 — 

= Ti{kt(x,x)kt{x,x)*) +Tr{kt{y,y)kt{y,y)*) - 2Tv{kt{x,y)kt{x,y)*) 

(8.5) 

From the continuity of the kernel kt{x,y), it is clear that ||Vt(a::) — Vt{y)\\^2 — >■ 
as 2/ — >■ x. Since Ai is compact, it follows that Vt{Ai) is compact in £'^. Then we 
show that Vt is one-to-one. Fix x ^ y and a smooth vector field X that satisfies 
{X{x),X{x)) ^ {X{y),X{y)). Since the eigen-vector fields {^„}^o form a basis to 
L'^iTM), we have 

oo 



X{z) = c„Xn{z), for all z G M, 

n=0 

where c„ = / {X,X„)dV. As a result, 
Jm 

oo 

{Xiz),Xiz))^ E 

{X^iz),X^{z)). 



n,m— 



Since {X{x),X{x)) ^ {X (y) , X (y)) , there exist n, m G N such that (Xnix), Xm{x)) ^ 
{Xn{y), Xm{y)) , which shows that Vt{x) 7^ Vf(y), i.e., Vt is one-to-one. From the fact 



21 



that the map Vt is continuous and one-to-one from Ai, which is compact, onto Vt(A^), 

we conclude that Vt is an embedding. 

□ 

Next, we demonstrate the asymptotic behavior of the vector diffusion distance 
dyuM.t{x, y) and the diffusion distance d-DM,t{x, y) when t is small and x is close to y. 
The following theorem shows that in this asymptotic limit both the vector diffusion 
distance and the diffusion distance behave like the geodesic distance. 

Theorem 8.2. Let {Ai,g) he a smooth d-dim closed Riemannian manifold. Sup- 
pose x,y £ Ai so that x — exp^^ v, where v G TyAi. For any t > 0, when ^ t ^ 1 
we have the following asymptotic expansion of the vector diffusion distance: 

dloMM y) - rf(4^)-'f^ + 0{t-'') 

Similarly, when ^ t <C f, we have the following asymptotic expansion of the 

diffusion distance: 

Proof. Fix y and a normal coordinate around y. Denote y) = | det((i„ expj^)|, 
where x — expj^(w), v S T^Ai. Suppose ||t;|| is small enough so that x = expj^(w) is 
away from the cut locus of y. It is well known that the heat kernel kt{x,y) for the 
connection Laplacian over the vector bundle £ possesses the following asymptotic 
expansion when x and y are close: [SJ p. 84] or [III 

\\d^{h{x,y) - k^ix^yMi = 0{t^-'/'-'/'-'), (8.6) 

where || • ||/ is the C' norm, 

N 

k^{x, y) (47rt)-'^/2e-ll"ll'/4*j(x, yY^'^ ^ f^,{x, y), (8.7) 

N > d/2, and is a smooth section of the vector bundle £ ® S* over M. x Ai. 
Moreover, $o(a;, y) = Px.y is the parallel transport from £y to £x. In the VDM setup, 
we take £ = TAi, the tangent bundle of Ai. Also, by O Proposition 1.28], we have 
the following expansion: 

j{x, y) - 1 + Ric(t,, v)/6 + 0{\\vf). (8.8) 



Equations (8.7) and (18. 81) lead to the following expansion under the assumption 



Wvr^t: 

Tr{kt{x,y)kt{x,yy) 

= (47ri)-'^e-ll'^ll'/2*(l + Ric(i;, v)/6 + 0(1^1^))-^ Tr((P,,, + 0(t))((P,,, + 0(i))*) 
= (47ri)-''e-ll"ll'/2*(i _ Ric(z;,t>)/6 + Oi\\v\f))id + 0{t)) 

= (d + 0(<))(4^t)-'^(l-^+0 



2t \ t^ 

In particular, for ||u|| = we have 

TT{kt{x,x)kt{x,xy) = {d + OityiATTty^ 
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Thus, for < i < 1, we have 



dynuA^'y) = T^'^{h{x,x)kt{x, x)*) + Tr{kt{y,y)kt{y,y)*) - 2Tr{kt{x,y)kt{x,y)*) 
= d(47r)-'^M!+o(i-<i). (8.9) 

By the same argument we can carry out the asymptotic expansion of the diffu- 
sion distance rfoM.tl^c, y). Denote the eigenfunctions and eigenvalues of the Laplace- 
Bchrami operator A by 0„ and /i„. We can rewrite the diffusion distance as follows: 

CO 

c^DM,t(2;,2/) X! e"^" (a:) - (l)n{y)f kt{x,x) + kt{y,y) - 2kt{x,y), (8.10) 

where kt is the heat kernel of the Laplace-Beltrami operator. Note that the Laplace- 
Beltrami operator is equal to the connection-Laplacian operator defined over the triv- 



ial line bundle over A4. As a result, equation (8.7) also describes the asymptotic 



expansion of the heat kernel for the Laplace-Beltrami operator as 

ktix, y) = (47ri)-''/'e-"""'/"*(l + Ric(«, v)/Q + 0{\\vf))-^'\l + 0{t)). 
Put these facts together, we obtain 

dUMy) = (47r)-'^/2 +Or'*/2), (8.11) 

when < i < 1. □ 

9. Application of VDM to Cryo-Electron Microscopy. Besides being a 
general framework for data analysis and manifold learning, VDM is useful for perform- 
ing robust multi-reference rotational alignment of objects, such as one-dimensional pe- 
riodic signals, two-dimensional images and three-dimensional shapes. In this Section, 
we briefly describe the application of VDM to a particular multi-reference rotational 
alignment problem of two-dimensional images that arise in the field of cryo-electron 
microscopy (EM). A more comprehensive study of this problem can be found in [3 6) 
and [in] . It can be regarded as a prototypical multi-reference alignment problem, and 
we expect many other multi-reference alignment problems that arise in areas such as 
computer vision and computer graphics to benefit from the proposed approach. 

The goal in cryo-EM [Hj is to determine 3D macromolecular structures from noisy 
projection images taken at unknown random orientations by an electron microscope, 
i.e., a random Computational Tomography (CT). Determining 3D macromolecular 
structures for large biological molecules remains vitally important, as witnessed, for 
example, by the 2003 Chemistry Nobel Prize, co-awarded to R. MacKinnon for resolv- 
ing the 3D structure of the Shaker K'^ channel protein, and by the 2009 Chemistry 
Nobel Prize, awarded to V. Ramakrishnan, T. Steitz and A. Yonath for studies of the 
structure and function of the ribosome. The standard procedure for structure deter- 
mination of large molecules is X-ray crystallography. The challenge in this method is 
often more in the crystallization itself than in the interpretation of the X-ray results, 
since many large proteins have so far withstood all attempts to crystallize them. 

In cryo-EM, an alternative to X-ray crystallography, the sample of macromolecules 
is rapidly frozen in an ice layer so thin that their tomographic projections are typi- 
cally disjoint; this seems the most promising alternative for large molecules that defy 
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Fig. 9.1. Schematic drawing of the imaging process: every projection image corresponds to 
some unknown 3D rotation of the unknown molecule. 



crystallization. The cryo-EM imaging process produces a large collection of tomo- 
graphic projections of the same molecule, corresponding to different and unknown 
projection orientations. The goal is to reconstruct the three-dimensional structure of 
the molecule from such unlabeled projection images, where data sets typically range 
from 10"* to 10^ projection images whose size is roughly 100 x 100 pixels. The intensity 
of the pixels in a given projection image is proportional to the line integrals of the 
electric potential induced by the molecule along the path of the imaging electrons 



(see Figure 9.11. The highly intense electron beam destroys the frozen molecule and 
it is therefore impractical to take projection images of the same molecule at known 
different directions as in the case of classical CT. In other words, a single molecule 
can be imaged only once, rendering an extremely low signal-to-noise ratio (SNR) for 



the images (see Figure 9.2 for a sample of real microscope images), mostly due to shot 
noise induced by the maximal allowed electron dose (other sources of noise include 
the varying width of the ice layer and partial knowledge of the contrast function of 
the microscope). In the basic homogeneity setting considered hereafter, all imaged 
molecules are assumed to have the exact same structure; they differ only by their 
spatial rotation. Every image is a projection of the same molecule but at an unknown 
random three-dimensional rotation, and the cryo-EM problem is to find the three- 
dimensional structure of the molecule from a collection of noisy projection images. 

The rotation group SO (3) is the group of all orientation preserving orthogonal 
transformations about the origin of the three-dimensional Euclidean space under 
the operation of composition. Any 3D rotation can be expressed using a 3 x 3 orthog- 
/ I I I \ 

onal matrix i? ( R^ | satisfying = = /axs and det i? = 1. 
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Fig. 9.2. A collection of four real electron microscope images of the E. coli SOS ribosomal 
subunit; courtesy of Dr. Fred Sigworth. 



The column vectors E},B?,B? of R form an orthonormal basis to 'E? . To each pro- 
jection image P there corresponds a 3 x 3 unknown rotation matrix R describing its 



orientation (see Figure 9.1 ). Excluding the contribution of noise, the intensity P{x, y) 
of the pixel located at (x, y) in the image plane corresponds to the line integral of the 
electric potential induced by the molecule along the path of the imaging electrons, 
that is, 



P{x,y)^ (t){xR^ + yR^ + zR^) dz (9.1) 

J —oo 

where </> : M'^ !-> M is the electric potential of the molecule in some fixed 'laboratory' 



coordinate system. The projection operator ( 9.1 ) is also known as the X-ray transform 
[29]. 

We therefore identify the third column R^ of R as the imaging direction, also 
known as the viewing angle of the molecule. The first two columns R^ and R^ form 
an orthonormal basis for the plane in perpendicular to the viewing angle R^ . All 
clean projection images of the molecule that share the same viewing angle look the 
same up to some in-plane rotation. That is, if Ri and Rj are two rotations with the 
same viewing angle i?f = then R} , R^ and i?j , i?| are two orthonormal bases for the 
same plane. On the other hand, two rotations with opposite viewing angles R^ = —Rj 
give rise to two projection images that are the same after reflection (mirroring) and 
some in-plane rotation. 

As projection images in cryo-EM have extremely low SNR, a crucial initial step 
in all reconstruction methods is "class averaging" [T31 SI]. Class averaging is the 
grouping of a large data set of n noisy raw projection images Pi, . . . , P„ into clusters, 
such that images within a single cluster have similar viewing angles (it is possible 
to artificially double the number of projection images by including all mirrored im- 
ages). Averaging rotationally-aligned noisy images within each cluster results in "class 
averages"; these are images that enjoy a higher SNR and are used in later cryo-EM 
procedures such as the angular reconstitution procedure [IHl that requires better qual- 
ity images. Finding consistent class averages is challenging due to the high level of 
noise in the raw images as well as the large size of the image data set. A sketch of 
the class averaging procedure is shown in Figure |9.3| 

Penczek, Zhu and Frank |31j introduced the rotationally invariant K-means clus- 
tering procedure to identify images that have similar viewing angles. Their Rotation- 
ally Invariant Distance dmD{i,j) between image Pi and image Pj is defined as the 
Euclidean distance between the images when they are optimally aligned with respect 
to in-plane rotations (assuming the images are centered) 

dRiD(«,i)= inin \\P,-R{e)Pj\\, (9.2) 

6^[0,2tt) 
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(a) Clean image 



(b) Pr 



(d) Average 



Fig. 9.3. Ii^gjl A clean simulated proj ection image of the ribosomal subunit generated from its 
known volume; \(b)\ Noisy instance of \(a)\ denoted Pi, obtained by the addition of white Gaussian 
noise. For the simulated images we chose the SNR to be higher than that of experimental images 
in order for image features to be clearly visible; \( c )\ Noi sy projection, denoted Pj, taken at the same 
viewing angle but with a different in-plane rotation; ] (dj\ Averaging the noisy images \(b )\ and \( c )\ after 
in-plane rotational alignment. The class average of the two images has a higher SNR than that of 
the noisy images\(b)\ and\(c)\ and it has better similarity with the clean imageUaJl 



where R{0) is the rotation operator of an image by an angle 9 in the counterclockwise 
direction. Prior to computing the invariant distances of (9.2), a common practice is 
to center all images by correlating them with their total average ^ X^iLi '^hich is 
approximately radial (i.e., has little angular variation) due to the randomness in the 
rotations. The resulting centers usually miss the true centers by only a few pixels (as 
can be validated in simulations during the refinement procedure). Therefore, like |31j . 
we also choose to focus on the more challenging problem of rotational alignment by 
assuming that the images are properly centered, while the problem of translational 
alignment can be solved later by solving an overdetermined linear system. 

It is worth noting that the specific choice of metric to measure proximity between 
images can make a big difference in class averaging. The cross-correlation or Euclidean 
distance (9.2) are by no means optimal measures of proximity. In practice, it is 
common to denoise the images prior to computing their pairwise distances. Although 
the discussion which follows is independent of the particular choice of filter or distance 
metric, we emphasize that filtering can have a dramatic effect on finding meaningful 
class averages. 

The invariant distance between noisy images that share the same viewing angle 
(with perhaps a different in-plane rotation) is expected to be small. Ideally, all neigh- 
boring images of some reference image Pi in a small invariant distance ball centered at 
Pj should have similar viewing angles, and averaging such neighboring images (after 
proper rotational alignment) would amplify the signal and diminish the noise. 

Unfortunately, due to the low SNR, it often happens that two images of completely 
different viewing angles have a small invariant distance. This can happen when the 
realizations of the noise in the two images match well for some random in-plane 
rotational angle, leading to spurious neighbor identification. Therefore, averaging the 
nearest neighbor images can sometimes yield a poor estimate of the true signal in the 
reference image. 

The histograms of Figure 9.5 demonstrate the ability of small rotationally invari- 
ant distances to identify images with similar viewing directions. For each image we 
use the rotationally invariant distances to find its 40 nearest neighbors among the 
entire set of n = 40, 000 images. In our simulation we know the original viewing 
directions, so for each image we compute the angles (in degrees) between the viewing 
direction of the image and the viewing directions of its 40 neighbors. Small angles 

26 



(a) Clean (b) SNR=1 (c) SNR=l/2 (d) SNR=l/4 (e) SNR=l/8 




Fig. 9.4. Simulated projection with various levels of additive Gaussian white noise. 



indicate successful identification of "true" neighbors that belong to a small spherical 
cap, while large angles correspond to outliers. We see that for SNR=l/2 there are 
no outliers, and all the viewing directions of the neighbors belong to a spherical cap 
whose opening angle is about 8°. However, for lower values of the SNR, there are 
outliers, indicated by arbitrarily large angles (all the way to 180°). 




Fig. 9.5. Histograms of the angle (in degrees, x-axis) between the viewing directions of 40,000 
images and the viewing directions of their 40 nearest neighboring images as found by computing the 
rotationally invariant distances. 



Clustering algorithms, such as the K-means algorithm, perform much better than 
nai've nearest neighbors averaging, because they take into account all pairwise dis- 
tances, not just distances to the reference image. Such clustering procedures are 
based on the philosophy that images that share a similar viewing angle with the ref- 
erence image are expected to have a small invariant distance not only to the reference 
image but also to all other images with similar viewing angles. This observation 
was utilized in the rotationally invariant K-means clustering algorithm [3T]. Still, 
due to noise, the rotationally invariant K-means clustering algorithm may suffer from 
misidentifications at the low SNR values present in experimental data. 

VDM is a natural algorithmic framework for the class averaging problem, as it can 
further improve the detection of neighboring images even at lower SNR values. The 
rotationally invariant distance neglects an important piece of information, namely. 



the optimal angle that realizes the best rotational alignment in (9.2) 



% =argmin IIP,- i?(0)Pj II, i,j^l,...,n. (9.3) 

e6[0,27r) 

In VDM, we use the optimal in-plane rotation angles dij to define the orthogonal 
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transformations Oij and to construct the matrix S in (3.1). The eigenvectors and 



eigenvalues of D^^S (other normalizations of S are also possible) are then used to 
define the vector diffusion distances between images. 

This VDM based classification method is proven to be quite powerful in practice. 
We applied it to a set of n — 40, 000 noisy images with SNR=l/64. For every image we 
find the 40 nearest neighbors using the vector diffusion metric. In the simulation we 
know the viewing directions of the images, and we compute for each pair of neighbors 
the angle (in degrees) between their viewing directions. The histogram of these angles 
is shown in Figure |9.6| (Left panel) . About 92% of the identified images belong to a 
small spherical cap of opening angle 20°, whereas this percentage is only about 65% 
when neighbors are identified by the rotationally invariant distances (Right panel). 
We remark that for SNR=l/50, the percentage of correctly identified images by the 
VDM method goes up to about 98%. 
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(a) Neighbors are identified using dvDM' t—2 



(b) Neighbors are identified using ^h^id 



Fig. 9.6. SNR=l/6A: Histogram of the angles (x-axis, in degrees) between the viewing directions 
of each image (out o/40000j and it 40 neighboring images. Left: neighbors are post identified using 
vector diffusion distances. Right: neighbors are identified using the original rotationally invariant 
distances dmo. 



The main advantage of the algorithm presented here is that it successfully iden- 
tifies images with similar viewing angles even in the presence of a large number of 
spurious neighbors, that is, even when many pairs of images with viewing angles that 
are far apart have relatively small rotationally invariant distances. In other words, 
the VDM-based algorithm is shown to be robust to outliers. 

10. Summary and Discussion. This paper introduced vector diffusion maps, 
an algorithmic and mathematical framework for analyzing data sets where scalar 
affinities between data points are accompanied with orthogonal transformations. The 
consistency among the orthogonal transformations along different paths that connect 
any fixed pair of data points is used to define an affinity between them. We showed 
that this affinity is equivalent to an inner product, giving rise to the embedding of the 
data points in a Hilbert space and to the definition of distances between data points, 
to which we referred as vector diffusion distances. 

For data sets of images, the orthogonal transformations and the scalar affinities 
are naturally obtained via the procedure of optimal registration. The registration 
process seeks to find the optimal alignment of two images over some class of transfor- 
mations (also known as deformations), such as rotations, refiections, translations and 
dilations. For the purpose of vector diffusion mapping, we extract from the optimal 
deformation only the corresponding orthogonal transformation (rotation and reflec- 
tion) . We demonstrated the usefulness of the vector diffusion map framework in the 
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organization of noisy cryo-elcctron microscopy images, an important step towards re- 
solving three-dimensional structures of macromolecules. Optimal registration is often 
used in various mainstream problems in computer vision and computer graphics, for 
example, in optimal matching of three-dimensional shapes. We therefore expect the 
vector diffusion map framework to become a useful tool in such applications. 

In the case of manifold learning, where the data set is a collection of points in a 
high dimensional Euclidean space, but with a low dimensional Riemannian manifold 
structure, we detailed the construction of the orthogonal transformations via the 
optimal alignment of the orthonormal bases of the tangent spaces. These bases are 
found using the classical procedure of PGA. Under certain mild conditions about 
the sampling process of the manifold, we proved that the orthogonal transformation 
obtained by the alignment procedure approximates the parallel transport operator 
between the tangent spaces. The proof required careful analysis of the local PGA 
step which we believe is interesting of it own. Furthermore, we proved that if the 
manifold is sampled uniformly, then the matrix that lies at the heart of the vector 
diffusion map framework approximates the connection-Laplacian operator. Following 
spectral graph theory terminology, we call that matrix the connection-Laplacian of 
the graph. Using different normalizations of the matrix we proved convergence to the 
connection-Laplacian operator also for the case of non-uniform sampling. We showed 
that the vector diffusion mapping is an embedding and proved its relation with the 
geodesic distance using the asymptotic expansion of the heat kernel for vector fields. 
These results provide the mathematical foundation for the algorithmic framework that 
underlies the vector diffusion mapping. 

We expect many possible extensions and generalizations of the vector diffusion 
mapping framework. We conclude by mentioning a few of them. 

• The topology of the data. In we showed how the vector diffusion mapping 
can determine if a manifold is orientable or non-orientable, and in the latter 
case to embed its double covering in a Euclidean space. To that end we used 
the information in the determinant of the optimal orthogonal transformation 
between bases of nearby tangent spaces. In other words, we used just the op- 
timal reflection between two orthonormal bases. This simple example shows 
that vector diffusion mapping can be used to extract topological informa- 
tion from the point cloud. We expect more topological information can be 
extracted using appropriate modifications of the vector diffusion mapping. 

• Hodge and higher order Laplacians. Using tensor products of the optimal 
orthogonal transformations it is possible to construct higher order connection- 
Laplacians that act on p-forms {p > 1). The index theorem [T^ relates 
topological structure with geometrical structure. For example, the so-called 
Betti numbers are related to the multiplicities of the harmonic p-forms of 
the Hodge Laplacian. For the extraction of topological information it would 
therefore be useful to modify our construction in order to approximate the 
Hodge Laplacian instead of the connection-Laplacian. 

• Multiscale, sparse and robust PC A. In the manifold learning case, an impor- 
tant step of our algorithm is local PGA for estimating the bases for tangent 
spaces at different data points. In the description of the algorithm, a single 
scale parameter epcA is used for all data points. It is conceivable that a better 
estimation can be obtained by choosing a different, location-dependent scale 
parameter. A better estimation of the tangent space Tx^J^ may be obtained 
by using a location-dependent scale parameter epcA,i due to several reasons: 
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non-uniform sampling of the manifold, varying curvature of the manifold, and 
global effects such as different pieces of the manifold that are almost touching 
at some points (i.e., varying "condition number" of the manifold). Choosing 
the correct scale epcA,i is a problem of its own interest that was recently con- 
sidered in [28] , where a multiscale approach was taken to resolve the optimal 
scale. We recommend the incorporation of such multiscale PCA approaches 
into the vector diffusion mapping framework. Another difficulty that we may 
face when dealing with real-life data sets is that the underlying assumption 
about the data points being located exactly on a low-dimensional manifold 
does not necessarily hold. In practice, the data points are expected to reside 
off the manifold, either due to measurement noise or due to the imperfection 
of the low-dimensional manifold model assumption. It is therefore necessary 
to estimate the tangent spaces in the presence of noise. Noise is a limiting 
factor for successful estimation of the tangent space, especially when the data 
set is embedded in a high dimensional space and noise effects all coordinates 
[23] . We expect recent methods for robust PCA [7] and sparse PCA [6] [24] 
to improve the estimation of the tangent spaces and as a result to become 
useful in the vector diffusion map framework. 

• Random matrix theory and noise sensitivity. The matrix S that lies at the 
heart of the vector diffusion map is a block matrix whose blocks are either 
dx d orthogonal matrices Oij or the zero blocks. We anticipate that for some 
applications the measurement of Oij would be imprecise and noisy. In such 
cases, the matrix S can be viewed as a random matrix and we expect tools 
from random matrix theory to be useful in analyzing the noise sensitivity of 
its eigenvectors and eigenvalues. The noise model may also allow for out- 
liers, for example, orthogonal matrices that are uniformly distributed over 
the orthogonal group 0{d) (according to the Haar measure). Notice that the 
expected value of such random orthogonal matrices is zero, which leads to 
robustness of the eigenvectors and eigenvalues even in the presence of large 
number of outliers (see, for example, the random matrix theory analysis in 

• Compact and non-compact groups and their matrix representation. As men- 
tioned earlier, the vector diffusion mapping is a natural framework to organize 
data sets for which the affinities and transformations are obtained from an 
optimal alignment process over some class of transformations (deformations). 
In this paper we focused on utilizing orthogonal transformations. At this 
point the reader have probably asked herself the following question: Is the 
method limited to orthogonal transformations, or is it possible to utilize other 
groups of transformations such as translations, dilations, and more? We note 
that the orthogonal group 0{d) is a compact group that has a matrix rep- 
resentation and remark that the vector diffusion mapping framework can be 
extended to such groups of transformations without much difficulty. However, 
the extension to non-compact groups, such as the Euclidean group of rigid 
transformation, the general linear group of invertible matrices and the special 
linear group is less obvious. Such groups arise naturally in various applica- 
tions, rendering the importance of extending the vector diffusion mapping to 
the case of non-compact groups. 
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Appendix A. Some Differential Geometry Background. The purpose of 
this appendix is to provide the required mathematical background for readers who are 
not familiar with concepts such as the parallel transport operator, connection, and 
the connection Laplacian. We illustrate these concepts by considering a surface A4 

embedded in M^. 

Given a function f{x) : M'^ — > M, its gradient vector field is given by 



Through the gradient, we can find the rate of change of / at a; e in a given 
direction u € R^, using the directional derivative: 



It is natural to extend the derivative notion to a given vector field X at x € by 
mimicking the derivative definition for functions in the following way: 





fix + tv) - f{v) 



By chain rule we have vf{x) = Vf{x){v). Define Vvf{x) := Vf{x){v). 
Let X be a vector field on M^, 

X{x,y,z) = {fi{x,y,z),f2{x,y,z),f3{x,y,z)). 




(A.1) 



t 
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where t; G M^. Following the same notation for the directional derivative of a function, 
we denote this limit by V^X(a;). This quantity tells us that at x, following the 
direction u, we compare the vector field at two points x and x + tv, and see how the 
vector field changes. While this definition looks good at first sight, we now explain 
that it has certain shortcomings that need to be fixed in order to generalize it to the 
case of a surface embedded in M.^. 

Consider a two dimensional smooth surface A4 embedded in M.'^ by l. Fix a point 
X & M and a smooth curve j{t) : (— e, e) — > C M'^, where e ^ 1 and 7(0) — x. 
7'(0) S M'^ is called a tangent vector to at a;. The 2 dimensional affine space 
spanned by the collection of all tangent vectors to 7W at a; is defined to be the tangent 
plane at x and denoted bj^T^^TM, which is a two dimensional affine space inside K^, 
as illustrated in Figure |A.1| (left panel) . Having defined the tangent plane at each 
point X G Ai, we define a vector field X over to be a differentiable map that maps 
a; to a tangent vector in T^rA^]^ 






Fig. A.l. 

derivative 



Left: a tangent plane and a curve 7; Middle: a vector field; Right: the covariant 



(A.ll 



We now generalize the definition of the derivative of a vector field over M.^ 
to define the derivative of a vector field over A4. The first difficulty we face is how 
to make sense of "X{x + tv)" , since x + tv does not belong to Ai. This difficulty 



can be tackled easily by changing the definition ( A.l ) a bit by considering the curve 
so that 7(0) = X and 7'(0) = v. Thus, (|A.l[) becomes 



7 : i-e,e)^ : 



lim 



X(7(t))-X(7(0)) 



t 



(A.2) 



where v € M'^. In M, the existence of the curve 7 : (— e, e) — > M'^ so that 7(0) = x 
and 7'(0) = u is guaranteed by the classical ordinary differential equation theory. 
However, (A.2) still cannot be generalized to A4 directly even though X{'-f{t)) is well 
defined. The difiiculty we face here is how to compare X(j{t)) and X{x), that is, 
how to make sense of the subtraction X{'j{t)) — X{'j{0)). It is not obvious since a 
priori we do not know how T^^t)Ai and r.y(o)A^ are related. The way we proceed is 
by defining an important notion in differential geometry called "parallel transport" , 
which plays an essential role in our VDM framework. 

Fix a point x € M and a vector field X on M, and consider a parametrized curve 
7 : (— e, e) — > so that 7(0) = x. Define a vector valued function V : (— e, e) — > M'^ 



"Here we abuse notation slightly. Usually TxM defined here is understood as the embedded 
tangent plane by the embedding t of the tangent plane at x. Please see 1321 for a rigorous definition 
of the tangent plane. 

^See 1321 for the exact notion of differentiability. Here, again, we abuse notation slightly. Usually 
X defined here is understood as the embedded vector field by the embedding t of the vector field X. 
For the rigorous definition of a vector field, please see I32| . 
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by restricting X to 7, that is, V{t) — X{"f{t)). The derivative of V is weU defined as 
usual: 



dt t^o t 

where /i e (— e, e). The covariant derivative ^^{h) is defined as the projection of 
^(/i) onto T^(;i)A^. Then, using the definition of ^{h), we consider the following 
equation: 

^(t)=0 
W{0) = w 

where w G T-y(o)A^. The solution W{t) exists by the classical ordinary differential 
equation theory. The solution W(t) along "f{t) is called the parallel vector field along 
the curve 7(i), and we also call W{t) the parallel transport of w along the curve j{t) 
and denote W{t) — ,^(0)^- 

We come back to address the initial problem: how to define the "derivative" of 
a given vector field over a surface M. We define the covariant derivative of a given 
vector field X over A4 as follows: 

where 7 : (-e,e) M with 7(0) = x & M, 7'(0) = -y G T^(o)7W. This definition 
says that if we want to analyze how a given vector field &t x £ Ai changes along the 
direction we choose a curve 7 so that 7(0) = x and 7'(0) — v, and then "transport" 
the vector field value at point ^{t) to 7(0) = a: so that the comparison of the two 
tangent planes makes sense. The key fact of the whole story is that without applying 
parallel transport to transport the vector at point -f{t) to T^(^q-)M, then the subtraction 
^(7(0) "^(7(0)) € in general does not live on T^A4, which distorts the notion of 
derivative. For comparison, let us reconsider the definition ( A.l[ ). Since at each point 



X e M , the tangent plane at x is T^R^ — M^, the substraction Xix^tv^ — Xix) always 
makes sense. To be more precise, the true meaning of X{x + iu) is P^(o),7(t)^(7(^))i 
where -P-y(o),7(t) = id., and ^(t) = x + tv. 

With the above definition, when X and Y are two vector fields on A4, we define 
to be a new vector field on M so that 

VxYix) :- Vx(:.)Y. 

Note that X{x) € T^M. We call V a connection on A^j^ 

Once we know how to differentiate a vector field over , it is natural to consider 
the second order differentiation of a vector field. The second order differentiation of 
a vector field is a natural notion in . For example, we can define a second order 
differentiation of a vector field X over as follows: 

V2X:= V,V,X + VyVyX + V,V,X, (A.4) 

where a;, y, z are standard unit vectors corresponding to the three axes. This definition 
can be generalized to a vector field over as follows: 

V^X{x) := VE,yE^X{x) + VB,V£,X(a;), (A.5) 



^''The notion of connection can be quite general. For our purposes, tliis definition is sufficient. 
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where X is a vector field over M, x G M, and Ei , E2 are two vector fields on M that 
satisfy ^EiEj = for i,j = 1, 2. The condition V ^ . E j — (for i,j = 1, 2) is needed 



for technical reasons. Note that in the M case (A.4), if we set Ei = x, E2 = y and 
E3 = z, then y EiEj — for «, j = 1, 2, 3[^ The operator is called the connection 
Laplacian operator, which lies in the heart of the VDM framework. The notion of 
eigen-vector-field over Ai is defined to be the solution of the following equation: 

V^X{x) = XX{x) 

for some A G R. The existence and other properties of the eigen-vector-fields can be 
found in |16| . Finally, we comment that all the above definitions can be extended 
to the general manifold setup without much difficulty, where, roughly speaking, a 
"manifold" is the higher dimensional generalization of a surface^ 



Appendix B. Proof of Theorem pTlj , Theorem |5.2| and Theorem |5.3[ 

Before stating and proving the theorems, we set up the notation that is used 
throughout this Appendix. Let l : A4 ^ MP he a. smooth d-dim compact Riemannian 
manifold embedded in W, with metric g induced from the canonical metric on W. 
Denote Ait = {x € M : min^gg^ d(x, y) < t}, where d{x, y) is the geodesic distance 
between x and y. The data points Xi,X2, ■ ■ ■ ,Xn are independent samples from M. 
according to the probability density function p € C^{Ai) supported on C and 
satisfies < p{x) < 00. We assume that the kernels used in the local PCA step 
and for the construction of the matrix S are in C2([0,l]). Although these kernels 
can be different we denote both of them by K and expect their meaning to be clear 
from the context. Denote r to be the largest number having the property: the open 
normal bundle about A4 of radius r is embedded in for every r < t [30] ■ This 
condition holds automatically since At is compact. In all theorems, we assume that 
y/e < T. In 30J, 1/t is referred to as the "condition number" of Ai. We denote 
Py.x ■ TxAi TyAi to be the parallel transport from x to y along the geodesic 
linking them. Denote by V the connection over TAi and the connection Laplacian 
over Ai. Denote by TZ, Ric, and s the curvature tensor, the Ricci curvature, and the 
scalar curvature of Ai, respectively. The second fundamental form of the embedding 
t is denoted by 11. To ease notation, in the sequel we use the same notation V to 
denote different connections on different bundles whenever there is no confusion and 
the meaning is clear from the context. 

We divide the proof of Theorem |5 . 1 1 into four theorems, each of which has its own 
interest. The first theorem, Theorem IB. 11 states that the columns of the matrix Oi 



that are found by local PCA (see (2.1 1) form an orthonormal basis to a d-dimensional 



subspace of that approximates the embedded tangent plane L^^TxiAi. The proven 



order of approximation is crucial for proving Theorem 5.1 The proof of Theorem B.l 
involves geometry and probability theory. 

Theorem B.l. If epcA — 0{n^^+^) and Xi ^ -A^^/j^, then, with high probabil- 
ity (w.h.p.), the columns {ui{xi)}f^^ of the p x d matrix Oi which is determined by 
local PCA, form an orthonormal basis to a d-dim subspace ofW that deviates from 
L^Tx -Ai by 0{ey^j^), in the following sense: 

min ||Of e. - 0\\hs = 0(4c^) = Oin~^^), (B.l) 
oeo(d) 



^^Please see 1321 for details. 

^^We will not provide details in the manifold setting, and refer readers to standard differential 
geometry textbooks, such as |32| . 
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where Qi is a px d matrix whose columns form an orthonormal basis to l^,Tx^A4. Let 
the minimizer in [B.l] be 



= argmin||Ofe,-0||Hs, (B.2) 
oeo(d) 

and denote by Qi the p x d matrix 

:= e,df , (B.3) 

and ei{xi) the l-th column of Qi. The columns of Qi form an orthonormal basis to 
Li,Tx-A4, and 

\\0,-Q,\\Hs = 0{epcA). (B.4) 
If Xi G Ai^/Y^, then, w.h.p. 

min ||Of a, - 0\\hs = 0{4l^) = 0{n-^^). 
oeO(d) 

Better convergence near the boundary is obtained for tpcA — 0{n'^'^), which 
gives 

min llOf e. - 0\\hs = 0(4ca) = 0{n-^^), 

OEO{dj 

for Xi e M,yT^, and 

min llOf e, - 0\\hs = 0(4''c^) = ©(n"^), (B.5) 

for X, ^ M^jr^. 

Theorem |B.1| may seem a bit counterintuitive at first glance. When considering 
data points in a ball of radius y^epcA, it is expected that the order of approximation 



would be O(epcA), while equation (B.l) indicates that the order of approximation is 
higher (3/2 instead of 1). The true order of approximation for the tangent space, as 
observed in ( B.4[ ) is still 0(e). The improvement observed in (B.l I is of relevance to 



Theorem |B . 2 1 and we relate it to the probabilistic nature of the PC A procedure, more 
specifically, to a large deviation result for the error in the law of large numbers for the 
covariance matrix that underlies PCA. Since the convergence of PCA is slower near 
the boundary, then for manifolds with boundary we need a smaller epcA- Specifically, 
for manifolds without boundary we choose epcA — 0{n^^+^) and for manifolds with 
boundary we choose epcA = 0{n~^+^). We remark that the first choice works also 
for manifolds with boundary at the expense of a slower convergence rate. 



The second theorem. Theorem B.2 states that the d x d orthonormal matrix 
Oij, which is the output of the alignment procedure (2.4), approximates the parallel 



transport operator Px^.xj from Xj to Xi along the geodesic connecting them. Assuming 
that ll^i— Xjll — 0{y/e) (here, e is different than epcA): the order of this approximation 

is 0(epcA + e'^^^) whenever Xi,Xj are away from the boundary. This result is crucial 
for proving Theorem |5.1| The proof of Theorem |B.2| uses Theorem |B.1| and is purely 
geometric. 

Theorem B.2. Consider Xi, Xj ^ A^y^^^ satisfying that the geodesic distance 
between Xi and Xj is 0{y^). For epcA — 0{n^^+^), w.h.p., Oij approximates Pxi,xj 



in the following sense: 

0,,X, = {{l,P^^^,^X{xj),ui{x,)))'I^^ + 0(6^^^ + £3/2), for all X e C\TM), 

(B.6) 

where Xi = {{L^,X{xi),ui{xi)))f^-^ G M.'^ , and {ui{xi)}f^i is an orthonormal set deter- 
mined by local PC A. For Xi, Xj € A^yj^^ 

0,,X, = {{L,P^^,,^X{xj),ui{xi))YU + 0{€^pIa + for all X e C\TM), 

(B.7) 

For epcA — 0{n^^^), the orders of epcA in the error terms change according to 
Theorem \B.1\ 



The third theorem, Theorem B.3 states that the n x n block matrix D^^Sa is 



a discrete approximation of an integral operator over smooth sections of the tangent 
bundle. The integral operator involves the parallel transport operator. The proof of 
Theorem B.3 mainly uses probability theory. 

Theorem B.3. Suppose epcA — 0{n~'^), and for < a < 1, define the 
estimated probability density distribution by 

n 

Pe{x^) = '^K^ {Xi,Xj) 



and the normalized kernel K^ ^ by 



(^Xi , Xj ) 

pf{x,)pf{xj)' 

where K, (a;,, x^) = K (M^hl^^pYkl.'^ . 
For Xi ^ Ad/Yf^ we have w.h.p. 



= l ^e,Q [Xi, Xj) 

+0 



jl/2gd/4-l/2 



3/2 3/2 
^PCA + 



where 



le,aX(Xi) - , (B.9j 

J^K,Axi,y}dV{y) 

Xi = {{L^,X{xi), ui{xi)))f^-^ G M'', {'Ui{xi)}f=i i^ orthonormal set determined by lo- 
cal PC A, X € C^(TA4), and Oij is the optimal orthogonal transformation determined 
by the alignment procedure. 

For Xi G A^y^^ we have w.h.p. 

^]=i,jr^i^<^,aixi,Xj)OijXj ,/ rr vf \ I \\\d m 1n^ 

{{l*T^^aX\Xi),Ui{Xi)))i^^ (B.IO) 
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For tpcA — 0{n^'^) the orders of epcA the error terms change according to 
Theorem \B.l\ 

The fourth theorem, Theorem B.4[ states that the operator T^ a can be expanded 
in powers of -^/e, where the leading order term is the identity operator, the second 
order term is the connection-Laplacian operator plus some possible potential terms, 
and the first and third order terms vanish for vector fields that are sufficiently smooth. 
For a = 1, the potential terms vanish, and as a result, the second order term is the 
connection-Laplacian. The proof is based on geometry. 

Theorem B.4. For X e C^{TM) and x ^ Mf^ we have: 



7712 

2dmo 



(B.ll) 

where „ is defined in (B.9), is the connection-Laplacian over vector fields, and 
Tn, =^;||x|r/C(||x||)da;. 



Corollary B.5. Under the same conditions and notations as in Theorem B.4 
if X e C^{TM), then for all x ^ we have: 



7772 



T,^iX{x) = X{x) + e—^V'Xix) + O(e^). 

2(27770 

Putting Theorems [RT][R3| a nd [R4| together, we now prove Theore m |5.1[ 



(B.12) 



Proof [Proof of Theorem 5.1 Suppose Xi ^ A^^- By Theorem 

= K^^a (x-i, Xj) OijX 



B.3 



.h.p. 



{{''*Te,aX{Xi),Ui{Xi)))i^-^ 



(B.13) 



1 



3/2 
^PCA 



' „l/2£<i/4-l/2 

[{L^T^,aX(xi),ei{xi)))'j;^^ 



-0 



1 



,l/2,(i/4-l/2 



3/2 
^PCA 



^3/2 



^3/2 



where epcA = 0{n '^+^ ), and we used Theorem B.l to replace ui{xi) by ei{xi). Using 
Theorem B.4 for the right hand side of (B.13 1, we get 

Sj = l j-^i Ki,a i^i, Xj) OijXj 



X]j = lj-^i Ke,a {Xi, Xj) 



= ( i*X{xi) + e 



7772 

2dmn 



T72 A ^ Js"-^ '^eXix,)Vg{p'-"){x,)dd 
V X[Xi) + a— 



P^-^iXr) 



■,ei{xi) 



1=1 



+0 



1 



ll/2£d/4-l/2 



3/2 , ,3/2 
^PCA + ' 



For e = 0(77 upon dividing by e, the three error terms are 



,l/2pd/4+l/2 



1 0/9 d+8 

el/2 ^ o{n-^). 
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Clearly the three error terms vanish as n -> oo. Specifically, the dominant error is 
0(ri^3+3) which is the same as 0{y/e). As a result, in the limit n — > oo, almost sm^ely, 

Sj = lj>ii Ke,a {Xi, Xj) OijXj _^ 
X]j = l ^e,a {Xi, Xj) 

as required. □ 

B.l. Preliminary Lemmas. For the proofs of Theorem |B.l||B.4l we need the 
following Lemmas. 

Lemma B.6. In polar coordinates around x ^ M., the Riemannaian measure is 
given by 

dF(exp^ t9) = J{t, e)AtAe, 
where 9 G T^M, \\e\\ 1, t > 0, and 

Jit, 9) = f^-^ + t'^+^Ric{9, 9) + 0{t'^+^). 



lim - 



Proof. Please see [32] • D 

The following Lemma is needed in Theorem |B.1| and |B.2| 

Lemma B.7. Fix x a M and denote exp^, the exponential map at x and expj*^^^ 
the exponential map at l{x). With the identification ofT^(^^)W with W , for v € T^M. 
with \\v\\ <C 1 we have 

L o exp^(«) = Lix) + diiv) + luiv, v) + ^V„n(z;, v) + Oi\\vf). (B.14) 

2 D 

Furthermore, for w € T^Ai = M."^, we have 

d[Lo expX M =d[Lo exp^]„^o {w)+n{v, w) + ^ V„n(t>, w) + ^ V^^(^;, v) + 0{\\vf) 

(B.15) 

Proof Denote = (exp^^^^) ^ o to exp^, that is, 

t o exp^ = expf(^^) o(/). (B.16) 

Note that 0(0) = 0. Since (j) can be viewed as a function from T^M = to T^(^j.-^W = 
W, we can Taylor expand it to get 

toexp,(z;) = expf'^) (^dcf>\o{v) + ^Wd(l>\o{v,v) + ^V^d<l)\o{v,v,v) + Oi\\vf )^ . 

We claim that 

V'^dexp^ |o(w, . . . , u) = for all fc > L (B.17) 

Indeed, from the definition of the exponential map we have that c?exp^ G T{T*TxA4 (8) 
TM) and 

Vdexp.^{v'{t),v'{t)) = Vdoxp,(i,'(t))dexp^(i;'(i)) - dexp^(V.„'(t)w'(t)), 
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where v G T^M, v{t) = tv e T^M, and v'{t) = v e TfyT^M. When evaluated at 
t = 0, we get the claim (B.17I for k = 1. The result for A; > 2 follows from a similar 
argument. 



We view dt as a smooth section of Hom{TAA^TW). Thus, by combining (B.17I 



with the chain rule, from (B.16) we have 



V'=#|o(w, . . . , w) = V''(iiU(v, . . . , w) for all A: > 0, 
and hence we obtain 

toexp,(i;) = expf"^) (^dL{v) + ]^VdL{v,v) + '^V'' dL{v , v , v) + 0{\\vf)^ . 



To conclude (B.14), note that for all v £ T.xA4, we have expj^^^(w) = l.(x) + v for all 
V e T,(^)MP ifweidentify r,,(^)MP with Rp. Next consider vector fields U, V and W 
around x so that U{x) = u, V{x) — v and W{x) = w, where u,v,w G T^M. A direct 
calculation gives 

VdiiV, W) ^ {Vvdi)W ^ VvidLiW)) - diiVyW) 

which is by definition the second fundamental form Il{V, W) of the embedding l. 
Similarly, we have 

W^diiU, V, W) = {Wl^ydL)W = {Wu{'^vdi))W - {W^^vdL)W 
^ Vc/((VydOVK) - WvdiiWuW) - {W^^vdL)W 
^Vu(JliV,W))-UiV,VuW)-U{VuV,W) =: (Vc/n)(F, W^), 



Evaluating Vdi(F, V) and V^dL{V, V, V) at x gives us (B.14| 



Next, when w G TyT^M and v e T^M, since d [i o exp^]^ (w) G T^ooxp^ v'^'' ^ K^, 
we can view d [t o exp^.]. (lu) as a function from T^Ad ^ M"^' to Rp. Thus, when ||u|| is 
small enough, Taylor expansion gives us 

d [i o expj^ (w) ^d[Lo expJo (u;)+V(d [t o expj. {w)Mv) + ^V^{d [l o expj. {w))\o{v, v)+0{\\vf), 

here d and V are understood as the ordinary differentiation over M.'^. To simplify the 
following calculation, for u,v,w £ R'^, we denote 

HUv) = ^v,„n(i;,i;) + lw^n{w,v) + ^V^n(w,w), 

ODD 

and 



G^{u,v) = -(V^n(M,i;) + V„n(w,w) + V„n(M,w)). 



Note that we again identify R'^ with T^Ai in the following calculation. By (B.14) 
when \\v\\ is small enough, we have 

,r 1 / N ,. t o exp^(u + (5w) — /, o exp„(w) 
d I o exp^l (w) = hm ^-^^^ / i-^?^ 



= lim 

5^0 



dL{dw) + U{v, Sw) + SH{v) + R{v + Sw) - R{v) 
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where R{v) is the remainder term in the Taylor expansion: 



|a|=4 

from which it fohows that 

R{v + Sw) - R{v) 



and as a result 



d[Lo expX M = Mw) + U{v, w) + H{v) + 0{\\v\\ \\w\\). (B.18) 



Similarly, from (B.18 1, when ||w|| is small enough we have 

V(d[toexp^J. (w))|„(w) = Imi 

= II{v,w) + G{u,v) + 0(||u||||w||||w||). 



(B.19) 



Finally, from (B.19) we have 

VM[^oexpJ M)|o(., .) ^ hm ° ^"P-]- 7 ° ^"P-]- ("^^1°^") 

(5-i.O 



G(z;,i;). 



(B.20) 



Thus, from (B.18 1 we have that 

d[ioexp^]o (w) = di{w), 



from (B.19) we have that 



V(c;[toexp^]. {w))\o(v) = n{v,w), 



and from (B.20 1 we have that 



G{v,v) = ^v^n{v,w) + ^v^n(w,«) 



Putting it all together we get (B.15) as required. □ 

Lemma B.8. Suppose x,y £ Ai such that y = exp^itO), where 9 € T^M and 
\\6\\ = 1. //i < 1, then h = \\i{x) - L{y)\\ < 1 satisfies 



t = h + -\\ii{e,0)\\h^ + o{h'). 



(B.21) 



Proof. Please see [5] or apply (B.14) directly. □ 

Lemma B.9. Fix x G M and y = ex.p^{t6), where 9 e T^M and \\9\\ = 1. Let 
{di{xy}f^-^ he the normal coordinate on a neighborhood U of x, then for a sufficiently 
small t, we have: 



uPy,xdi{x) = L,di{x) + tn{9,di{x)) + -^gn{9,di{x)) 

+ yVs,(,)n(0,0) - ^^L,Py^^{'R{9,di{x))9) + 0{t^). 



(B.22) 
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for all I — \, . . . ,d. 

Proof. Choose an open subset U C A4 small enough and find an open neigh- 
borhood B of G TxAi so that exp^, : B ^ U is difFeomorphic. It is well known 
that 



t 



where Ji{t) is the Jacobi field with J/(0) — and Vf J;(0) — di{x). By applying 
Taylor's expansion in a neighborhood of < = 0, we have 



Mt) = Py,. [ Mo) + tVtMo) + -^viMo) + jVtMo) 



Since J;(0) = J;(0) = 0, the following relationship holds: 



di{exp^{te)) = Py,, ( Vt Jz(0) + -V^ Ji(0) ) + 0(t'') 



Py,xdi{x) + -Py^,{n[e,di{x))e) + oit"). 



Thus we obtain 



Py,^di{x) = a,(exp,(t0)) - -Py^,{n{e,di{x))e) + o{t'), 



On the other hand, from (B.15) in Lemma B.7 we have 



(B.23) 



(B.24) 



t,9,(exp^(i0)) = iMx) + m{e,di{x)) + ^Ven(0,9,(x)) + ^Va,(,)n(0,0) + 0(^3). 

(B.25) 



Putting (B.24I and (B.25I together, it follows that for / = 1, . . . , d: 



i*Py..xdi{x) = t,a,(exp,(i0)) - ^-L,Py^^{TZ{e,di{x))e) + 0{r') 

= L,di{x) + m{e,di{x)) + -^gn{e,di{x)) 



□ 



(B.26) 



B.2. [Proof of Theorem 

standard orthonormal basis of K 



B.l 



Proof. Fix Xi ^ M 



Denote {vk]^i^^i the 



}' , that is, Ufc has 1 in the fc-tli entry and elsewhere. 
We can properly translate and rotate the embedding t so that i{xi) = 0, the first 
d components {wi,...,?;^} C MP form the orthonormal basis of i■^,Tx^Ai, and find 
a normal coordinate {dk}'i^i around Xi so that L^,dk{xi) = Vk. Instead of directly 
analyzing the matrix Bi that appears in the local PCA procedure given in (2.3), we 
analyze the covariance matrix := BiBf , whose eigenvectors coincide with the left 
singular vectors of Bi. We rewrite as 
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(B.27) 



where 



F,=K 



and 



F,{k,l)=K 



\i{Xi) ~ i{Xj)\\ 
V^PCA 

i{xi) - L{Xj)\\i 



{i{x,)-i{xM^{xo)~^{^^)f. (B.28) 



(/-(Xj) - L{Xi),Vk){i{Xj) ~ L{Xi),Vl). (B.29) 



Denote B ^ epcA i^i) ^'^ geodesic ball of radius ^epcA around a;^. We apply 

the same variance error analysis as in [351 Section 3] to approximate S^. Since the 
points Xi are independent identically distributed (i.i.d.), Fj, j ^ i, are also i.i.d., by 
the law of large numbers one expects 



1 " 

— J2f,^ef, 



(B.30) 



where F — Fi, 



EF 



KepcA{xz,y){L{y) - L{x^)){L{y) ~ L{xi)fp{y)dV{y), (B.31) 



and 



EF{k,l) 



KepcK{xi^y){'^iv) - i'{xi),Vk){i{v) ~ L{xi),vi)p[y)AV{y). 



(B.32) 



In order to evaluate the first moment EF(fc, I) of (B.32 ), we note that for y — exp^. v, 
where v £ T^.M, by ( |B.14 ) in Lemma [B.7| we have 

('•(exPa;^ v) - L{xi),Vk) = {i*v,Vk) + l^{U(v,v),Vk) + ^ (Vt,n(w, i;), Ufe) + Odlull"*). 

2 b 

(B.33) 



Substituting (B.33 1 into (B.32), applying Taylor's expansion, and combining Lemma 
Jan 



|B.8|and Lemma [B.6[ we have 

KepcAi^t^y){''iy) - t-(xi),Vk){i{y) - L{x,),vi)p{y)dV(y) 



V^PCA 



s<*-i Jo 



K 



( V'^PCA ) ( V*^PCA ) 



(B.34) 



[e{i,e,vk){i.e,vi) + ^^[{n{e,e),vk){L,e,vi) + {n{e,o),vi){i,9,vk)) +o(i^)} 



V^PCA 



dM6', (B.35) 



where (B.35) holds since integrals involving odd powers of must vanish due to 
the symmetry of the sphere S"*"^. Note that (t*0,Wfc) = when k = d + 1, . . . ,p. 
Therefore, 

EF(A:, /) - I o(,p^^<i/2+2) otherwise. ^^'^^^ 
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where D — Jgd-i \{i.*9,vi)\'^d9 i^(u)u'^+^du is a positive constant. 
Similar considerations give the second moment oi F{k,l) as 

r 0(epcA''/'+') for A;, ; = !,..., d, 
E[F{k,lf] = l 0(epcA''/2+4) for = 
[ 0(epcA''/^+^) otherwise. 

Hence, the variance of F{k, I) becomes 



(B.37) 



0(epcA'*/2+2) forfc,Z = l,..., 
VarF(fc,0=<( 0(epcA'*/^+'') forfc,Z = d+l, 
0(epcA'*^^^"^) otherwise. 



(B.38) 



We now move on to estabhsh a large deviation bound on the estimation of 
^^3Y i^j(fc, Z) by its mean EFj{k,l). For that purpose, we measure the devia- 
tion from the mean value by a and define its probability by 



Pk.iin,a) := Pr 



(B.39) 



To establish an upper bound for the probability Pk,iin, a), we use Bernstein's inequal- 
ity, see, e.g., [55]. Define 

Yj{kJ) := Fj{k,l) -EF{k,l). 

Clearly Yj(fc, I) are zero mean i.i.d. random variables. From the definition of Fj{k, I) 
(see |B.28 and |B.29 ) and from the calculation of its first moment (B.36), it follows 
that Yj(k,l) are bounded random variables. More specifically. 



O(epcA) 



for fc, Z = 1, 



YjikJ) 



O(epcA^) forfc,Z = <i+l, 
0(epcA^^^) otherwise. 



(B.40) 



Consider first the case k,l — 1, . . . , d, for which Bernstein's inequality gives 



Pk,i{'^-,a) < exp 



{n - l)a2 
'2E(yi(fc,02) + O(epcA)a 



< exp < — 



{?2 ~ \)a^ 
0(epcA'^/2+2) + 0(epcA)a 
(B.41) 



From (B.41 1 it follows that w.h.p. 

a = 

provided that 



/, <i/4+l 

I nV2 



1 



ni/2epcA'*/'^ 
Similarly, for A:,Z = (i-|-l,...,p, we have 

Pk,i{n,a) < exp 



< 1. 



(n- l)a2 



(B.42) 



0(epcA'^/2+4) + 0{epcA^)a 
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which means that w.h.p. 



epcA 



d/4+2 



provided (B.42). Finally, for A; = d + 1. 
k — 1 , . . . , d, we have 



Pk,i{n,a) < exp 
which means that w.h.p. 



ni/2 
...,p, I = 1, 

{n- l)a2 



,d or / = d + 1, . . 



0(epcA''/2+3) + 0{epcA^/^)c 



= O 



epcA 



d/4+3/2 



,1/2 



provided (B.42 ). The condition (B.42 ) is quite intuitive as it is equivalent to nepcA'*^^ 2> 
1, which says that the expected number of points inside B ^ epcA (^») large. 
As a result, when (B.42) holds, w.h.p., the covariance matrix Sj is given by 

(B.43) 



+ 



epcA 



d/4+l 



Idxd Odxp— d 

Op— dxd Op— rfxp— d 

0(1) 0(1) 
0(1) 0(1) 

0(1) 0(epcA^/') 
0(epcA^/') O(epcA) 



(B.44) 
(B.45) 



where Idxd is the identity matrix of size d x d, and Omxm' is the zero matrix of 
size m X m' . The error term in (B.44) is the bias term due to the curvature of the 
manifold, while the error term in (B.45) is the variance term due to finite sampling 
(i.e., finite n). In particular, under the condition in the statement of the theorem for 

2 

the sampling rate, namely, epcA = 0(n^3+5), we have w.h.p. 



+epCA''/^+^ 
= epcA'^'+'lD 



Idxd ^dxp—d 
Op— dxd Op— (ixp— d 

0(1) 0(1) 
0(1) 0(1) 

Idxd Odxp— d 

Op— dxd Op— dxp— d 



(B.46) 



epcA 



d/2+3/2 



0(1) 0(epcAi/') 
0(epcA^/') O(epcA) 

0(epcA^/') O(epcA) 
O(epcA) O(epcA) 



Note that by definition is symmetric, so we rewrite ( |B.46[ ) as 



^ + epcA^^^^ epcaO 
epcAC*-^ epcA^ 



(B.47) 



where / is the dxd identity matrix, A is a dx d symmetric matrix, O is a dx (p^d) ma- 
trix, and S is a (p—d) x (p—d) symmetric matrix. All entries of A, B, and C are 0(1). 
Denote by u^. and A^, k = 1, . . . ,p, the eigenvectors and eigenvalues of S^, where the 
eigenvectors are orthonormal, and the eigenvalues are ordered in a decreasing order. 
Using regular perturbation theory, we find that \k = DepcA'^^'^^^ (l + 0(epcA^^^)) 
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(for k — 1, . . . ,d), and that the expansion of the first d eigenvectors {uk}'l^i is given 

by 



Uk 



[0(epcA)]p_dxi 



(B.48) 



where {wk}f^i are orthonormal eigenvectors of A satisfying Awk — A^w^- Indeed, a 
direct calculation gives us 



I + epcA^/^A epcAC 
cpcaC'-^ epcA^ 



Wk + 4cA«3/2 + epcAf'2 + 0(epcA^/^) 
epcA^i + 40^^3/2 + O(epcA^) 



(B.49) 



Wk + epcA^/"^ Awk + epcA^/^i^3/2 + epcA^(^V3/2 +V2 + Czi) + ©(epcA^^^) 
evcAC^Wk + epcA^-B^i + 0(epcA^/^) 

where Vz/2tV2 G and zi, Z3/2 G MP^*^. On the other hand, 

i/2\^_L<: 2x 5/2NN r Wfc + epcA^/^W3/2 + epcA^i'2 + 0(epcA'''/^) 

(1 + epcA' + epcA A2 + C(epcA ' jj ^ ^ , ^ 3/2^ 1 n/'. 2^ 

L epcA^i + epcA ' 23/2 + (^(epcA j 

(B.50) 

Wk + epcA^^^A^Wfe + epcA^/^f3/2 + epcA^(A^W2 + i'3/2 + A2Wfc) + 0(epcA^/^) 
epcA^;! + epcA^/^(A^zi + Z3/2) + O(epcA^) 



where A2 G K. Matching orders of epcA between (|B.49|) and (|B.50|), we conclude that 



C Wk, 



O(epcA) : 
0(epcA^^^) : 23/2 - -y^k 

O(epcA') : {A - Xtl)v3/2^ \2Wk - CC^Wk- 



(B.51) 



Note that the matrix [A — A^/) appearing in (B.51) is singular and its null space 
is spanned by the vector Wk, so the solvability condition is A2 = ||C"^ii;fc||^/||wfe||^. 
We mention that v4 is a generic symmetric matrix generated due to random finite 
sampling, so almost surely the eigenvalue A^ is simple. 

Denote Oi the p x d matrix whose A:-th column is the vector Uk ■ We measure the 
deviation of the d-dini subspace of W spanned by u^., k = 1, . . . , d, from i^T^,. by 



min \\0 e^-0\\HS, 
oeo(d) 



(B.52) 



where is a p x c? matrix whose fc-th column is Vk (recall that Vk is the fc-th standard 
unit vector in M^). Let O be the dx d orthonormal matrix 



O = 



Then, 



dxd 



min ||Of e, - 0\\hs < 6, - 0||hs = 0(epcA^/'), 
oeo{d) 
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(B.53) 



which completes the proof for points away from the boundary. 

Next, we consider Xi € Ai ^ epcA - proof is almost the same as the above, so 
we just point out the main differences without giving the full details. The notations 
Si, Fj{k,l), pk,i{n,a), Yj(k,l) refer to the same quantities. Here the expectation of 
Fj{k, I) is: 



EF{k,l) 



V^PCA 



{xi)nM 



Ke-pcAi^i, y){''{y) - i-{xi),Vk){i.{y) - L{x^),vi)p{y)dV{y). 

(B.54) 



Due to the asymmetry of the integration domain exp~-'^(_B ^ epcA (^») ^ ■^) when Xi is 
near the boundary, we do not expect EFj{k,l) to be the same as (B.36) and (B.37), 
since integrals involving odd powers of 9 do not vanish. In particular, when / = 
d + 1, . . . ,p, k — 1, . . . , d or k = d + 1, . . . ,p, I — 1, ... ,d, (B.54) becomes 



EF{k,l) 



K 



t 



cxp-i(B^jj^(a;.)nA1) V PCA 



{i.e, Vk){U{e, 6), vi)p{x,)t''+^dtd9 + 0(epcA''/'+') 

(B.55) 



Note that for Xi £ A4 ^ epcA the bias term in the expansion of the covariance matrix 
differs from (B.44 1 when l = d+l,...,p, k — l,...,dork — d+l,...,p, l = l,...,d. 
Similar calculations show that 

EF{k,l) = 



0(epcA'*/2+i) 



when fc, Z = 1, . . . , d, 
when k,l ~ d + 1, . . . ,p, 



0{epcA'^/'^+^/'^) otherwise. 



(B.56) 



0(epcA''/^+^) when fc,? = 
E[F{k,lf]^{ 0{epcA'^/^+^) whcnk,l = d+l,...,p, (B.57) 
0(epcA''^^^^) otherwise. 



and 



0(epcA''/^+^) whenfc,/ = l,...,d, 
VarF(/c,0=<( 0(epcA'^/^+'') when k,l ^ d + 1, ... ,p, (B.58) 
0(epcA'^/^+''^) otherwise. 

Similarly, Yj(k,l) are also bounded random variables satisfying 

{O(epcA) for fc, Z = 1, . . . ,d, 
O(epcA') forfc,Z = d+l,...,p, (B.59) 
0(epcA'^^^) otherwise. 

Consider first the case k,l — 1, . . . ^ d, for which Bernstein's inequality gives 

PkAn,a) < exp (-- ,t;,,^^f^, A , (B.60) 



From (B.60 1 it follows that w.h.p. 

a = 



I epcA ' 
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provided (B.42 1. Similarly, for k,l = d + 1, ... ,p, we have 



PkA{n,a) < exp 
which means that w.h.p. 



d/i+2\ 



1/2 



provided (|B.42|). Finally, for k = d + 1. . . . ,p, I — 1. . . . , d or I 

[n- \)a^ 



k = 1 , . . . , cZ, we have 

Pk,i{n,a) < exp 
which means that w.h.p. 



0(epcA''/2+3) + 0(epcA3/2)a 



= O 



epcA 
ni72 



d/4+3/2 



provided (B.42). As a result, under the condition in the statement of the theorem for 

2 

the sampling rate, namely, epcA = 0(^1 we have w.h.p. 

d/2+l 



= epcA 



0(1) 




+epcA 



(i/2+3/2 



epCA 



d/2+l 



0(1) 0(1) 

0(1) 0(epcA^/') 

0(1) Odxp-d 

Op— dxd Op—dxp— d 



epcA 



d/4+1 



0(1) 0(epcAi/') 
0(epcA'/') O(epcA) 

0(epcA^/^) 0(epcA^/2) ' 
0(epcA^/') O(epcA) 



Then, by the same argument as in the case when Xi ^ M ^ epcA ^ conclude that 

min ||Ofe,-0||ffs = 0(epcA'/'). 

OGO(d) 

Similar calculations show that for epcA — 0(n^3+T) -we get 

min ||0fe,-0|lHS-0(epcA'/4) 
oeO{d) 



for Xi ^ M 



for Xi e A^viF^. 
□ 



and 



min ||Ofe,-0||HS = 0(epcA'/') 
oeo(d) 



B.3. [Proof of Theorem B.2 . Proof. Denote by Oi the p x d matrix whose 
columns ui{xi), I = 1,. . . ,d are orthonormal inside M.P as determined by local PCA 
around Xi. As in ( |B.3 1, we denote by ei{xi) the l-th column of Qi, where Qi is a 



px d matrix whose columns form an orthonormal basis of l^Tx-A4 so by Theorem |B.l 
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\\OfQi — Id-Wns — '^(^p'ca) £PCA — 0{n ^'+2 ), which is the case of focus here (if 
epcA = Oin-^^) then \\OfQ, - Id\\Hs - 0(e^(,\)). 

Fix Xi and the normal coordinate {di}f^^ around Xi so that t^,di{xi) = ei{xi). Let 
Xj = exp^. to, where 9 g Tx^M., \\9\\ = 1 and t = 0{y/e). Then, by the definition of 
the paraUel transport, we have 



.Xj 

^i^j) = '^g{X{Xj),Pxj,Xidl{Xi))dl{3 



(B.61) 



1=1 



and since the parallel transport and the embedding t are isometric we have 

g{P,,^,^X{xj),di{xij) = g{X{xj),P,^,,M^i}) = {i^*X{xj), L,P.,^,xAi^^)) ■ (B.62) 

Local PC A provides an estimation of an orthonormal basis spanning l^T^-M, 
which is free up to 0{d). Thus, there exists R E 0{p) so that i^T^^^A^ is invariant 
under R and e;(xj) = Ri*Pxj,xidi{xi) for all Z = 1, . . . , d. Hence we have the following 
relationship: 



{L^.X{Xj), i*Px,,x,dl{Xi)) = C^{'^*^{^])^<^k{Xj))ek{Xj), i*Px,,xA{Xi)) 

k=l 

d 

= '^{i'*X{xj),ek{xj)){ek{xj), i*Pxj,xAi^i)) 

k=l 
d 

= ^{Ri-*Pxj,x,dk{xi), L^Px.^x.di{xi)){L^X{xj),ek{xj)) 

k=l 
d 

:= '^Ri^k{i*X{xj),ek{xj)) := RX^, 



(B.63) 



k=l 



where i?;,fc := {Ri*Px,,xidk{xi), i*Px,,xA{^i)) ^ R ■= [Ri,k\i,k=i andXj = {{L^X{xj),ek{xj))Yk^^. 
On the other hand, Lemma [B.9| gives us 



Lk=l 



i*Px,,xA{Xt)'^Rl'* Pxj ,xAk{Xi) 



n{^Al{x^)fRi.Px,,xAk{x^ 



l,k=l 
d 

Lk=l 



2W^,^x^')U{e,^fRL,Px^,xAk{x^)+S/eU{0,^l{x^)fRL*Px,,xAk{x^) 

1 

- {i'*Pxi.xiTl{0,di{xi))0) Rl^P.^ ,^Ak{xi) 

(B.64) 



We now analyze the right hand side of (B.64) term by term. Note that since i^,Tx^M. 
is invariant under R, we have Ri*Pxj.xidk{xi) = ^"^^i Rr.k''*Pxj.xidr{xi). For the 
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0{t) term, we have 



r=l 

d 

: J2 Rr,kU{9, di{x,)f [L,dr{x,) + tn(0, + 0{t^)] (B.65) 

r=l 
d 



where the second equahty is due to Lemma B.9 and the third equaUty holds since 
n(0, di{xi)) is perpendicular to L^.dr{xi) for alH, r = 1, . . . , d. Moreover, Gauss equa- 
tion gives us 

= (7^(0,0R(x,),5^(a;,)) = n(0,9,(xO)^n(0,a.(xO)-n(0,a,(a;,)rn(0,ai(a;,)), 



which means the matrix 5*1 



n{e,di{x,)Y\i{e,dr{x,)) 



Lr=l 



is symmetric. 



Fix a vector field X on a neighborhood around Xi so that X{xi) = 9. By definition 
we have 

Vo,U{X,X)=Vo,mX,X))-2n{X,Wo,X) (B.66) 

Viewing TM as a subbundle of TW, we have the equation of Weingarten: 

Va, (n(X, X)) = -An(x.x)di + (n(X, X)), (B.67) 

where AYi{x,x)di and Vgj(n(X, X)) are the tangential and normal components of 
Vgi{n{X, X)) respectively. Moreover, the following equation holds: 



{Auix.x)di,i.dk) - {n{di,dk)Mx,x)). 



(B.68) 



By evaluating (B.66) and (B.67) at Xi, we have 

d 



V Q^(x^)Ii{6,e)'^ Rb*Px^,Xidk{Xi) ^YRr,k^ di(x,)^{0,0)'^ i^P^.^^.driXi) 

r=l 

d 



r=l 



r— 1 r— 1 

(B.69) 

where the third equality holds since 11(6', Wg^i^^.-fX) and Wg^^^ ~^(n(X, X)) are perpen- 
dicular to L^,di{xi) and the last equality holds by (B.68). Due to the symmetry of the 



second fundamental form, wc know the matrix 5*2 = 
is symmetric. 
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{U{d,e),U{di{x,),dr{x,))) 



Lr=l 



Similarly we have 

d 

Vgn{di{x,), efRu^P^^^^M^,) = RrAMod^.),e)Sf'^*9rix,) + 0{t). (B.70) 



r=l 

Since {AYi(di(xi)fi)OY' i'*dr{xi) = Il{e,di{xi)Y'T\{e,dr{xi)) by ( |B 68| , which we de- 
noted earlier by Si and used Gauss equation to conclude that it is symmetric. 

To estimate the last term, we work out the following calculation by using the 
isometry of the parallel transport: 

d 

{.i*Pxj,Xi'R-{0 ,dl{Xi))e)^ RL^Pa:^,Xidk{Xi) = ^ Rr,k{i*Pxj ,Xi{'R-{0 , dl{Xi))9) , L^Px^ ,xidr 



X] Rr,k9{Px,,xAT^i^^ di{x,))9), Px^,Xidr{Xi)) = ^ i^r,fcg(7^(6', di{Xi))d, dr{x,)) 

(B.71) 



r=l 



Denote 5*3 

n. 



g{Ti{e,di{x{))e,dr{x,)) 



id 



, which is symmetric by the definition of 



Substituting ( |B.65| ), ( |B.69| , ( |B.70[ ) and ( |B.7l[ ) into ( |B.64[ ) we have 

QlQj + t^i-Si - S2/S + Si/6 - S3/6)R + 0{t^) = R + t^SR + Oi^), (B.72) 

where S := —Si — 52/3 + Si/6 — S3/6 is a symmetric matrix. 

Suppose that both Xi and a;^ are not in A^^. To finish the proof, we have to 
understand the relationship between OfOj and QfQj, which is rewritten as: 



(B.73) 



From (B.l) in Theorem B.l we know 



11(0, - Q^fQ^\\HS - \\OfQ, - Id\\HS = Oiel^^A)^ 

which is equivalent to 

{O, - Q, fQ, = 0(ep/cA)- 
Due to (|B.72l) we have Qj = QiR + t'^QiSR + 0{t^), which together with (|B.74|) gives 



(B.74) 



(O. - Q^fQJ = (O, - Q^fiQ^R + t^Q.SR + 0(t3)) = 0(e3/?^ + e3/2). (B.75) 



Together with the fact that Qf = RQj + t'^SRQj + 0{t^) derived from (B.72), we 
have 

Of (O, - Q,) = QfiOj ~ Qj) + (O, ~ Q^fiO, - Qj) (B.76) 

- (i?gj + i25i?gj + 0{r')){0j - q,) + (o, - Q,)^(Oj - g^) 
= o(e^/c\ + £3/2) + (o^ _ g,)^(o, - g,) 



Recall that the following relationship between Oi and Qi holds (|B.48|) 

o, = g,; 



O(epcA) 



(B.77) 
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when the embedding i is properly translated and rotated so that it satisfies L{xi) = 
0, the first d standard unit vectors {wi, . . . , u^} C W' form the orthonormal basis 
of LifTx^M: and the normal coordinates {dk]'l^i around Xi satisfy L^,dk{xi) = v^. 
Similarly, the following relationship between Oj and Qj holds (B.48) 



0(4(?a) 
O(epcA) 



(B.78) 



when the embedding l is properly translated and rotated so that it satisfies b{xj) = 
0, the first d standard unit vectors {vi^...^Vd\ C MP form the orthonormal ba- 
sis of i^,TxjM, and the normal coordinates {dk}'l^i around Xj satisfy Lifdk{xj) = 
Ri-*Pxj,xidk{xi) — Vk- Also recall that i^T^^M is invariant under the rotation R 
Ckixi) and ek{xj) are related by ek(xj) — ek{xi) + 0{y/e). 



and from Lemma 
Therefore, 



B.9 



0,-Q, = {R + 0{V~e)) 



O(epcA) 



O(epcA) 



(B.79) 



when expressed in the standard basis of MP so that the first d standard unit vectors 
. . . ,Vd} C MP form the orthonormal basis of Lt^T^-M. Hence, plugging (B.79) into 
(B.76) gives 



OfiO,-Q,] 



0(4(?a) + iO^ 



0(4(:'a) 



Inserting ( |B.76[ ) and ( |B.80| ) into (|B.73|) concludes 

oTo 



Qi Qj 



PCA 



^3/2 



(B. 



(B.81) 



Recall that Oij is defined as Oij — UV^ , where U and V comes from the singular 
value decomposition of OfOj, that is, OfOj = UT,V^ . As a result. 



Oij = argmin ||Oj - 0||^^ 



argmin 

= argmin WqJQj + 0(ep(?A + e"^^") - R^O\\hs 

0£0{d) 

= argmin \\Id + t^lfSR + 0{ey^A + ^^^^ 



PCA 



lfO\\HS- 



Since R"^ SR is symmetric, we rewrite R^^SR — U'SU'^ , where U is an orthonormal 
matrix and E is a diagonal matrix with the eigenvalues of R^SR on its diagonal. 
Thus, 



Id + t'R' SR + 0(ep{?A + e^^') - O = U{Id + t'j:)U' + 0(e 



^3/2 



3/2 
PCA 



6^/2) - R^O. 



Since the Hilbert-Schmidt norm is invariant to orthogonal transformations, we have 

\\Id + fR^SR + 0(ep{.A + e^/') " R^O\\hs = Wild + f^)U^ + 0(ep(?A + ^^'^) - R^O\\h. 

= \\Id + i^S + 0(e^(.\ + £3/2) _ u^rTquWhs- 



Since U^R^OU is orthogonal, the minimizer must satisfy U'^ R^OU ~ Id+Oie'^^ 



PCA 



as otherwise the sum of squares of the matrix entries would be larger. Hence 



we conclude Oij = R + 0(ep'cA + ^ 
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Applying (B.63I and (B.48), we conclude 

y^.W^d ^ , ,3/2^ 



R{{L,X{x,),ei{x,) + 0(e^'cA)»f=i + 0{ 



= {{i,P,^^,^X{x,), ui{xi)))1^^ + 0(e3/J^ + £3/2) 

This concludes the proof for points away from the boundary. 

When Xi and Xj are in M ^ epcA i ^^'^ same reasoning as above we get 



0,jXj = {{i,P^^^^^X{xj),ui(x,)))'l^^ + 0{ 



1/2 I ,3/2n 



This concludes the proof. We remark that similar results hold for £pcA — 0{n "i+i 
using the results in Theorem |B.1[ □ 



B.4. [Proof of Theorem B.3) . Proof. We demonstrate the proof for the case 
when the data is uniformly distributed over the manifold. The proof for the non- 
uniform sampling case is the same but more tedious. Note that when the data is 
uniformly distributed, q, = Tg o for all < a < 1, so in the proof we focus on 
analyzing := T^^q. Denote :— K^^q. Fix Xi ^ -M^/j^^. We rewrite the left hand 
side of (B.8I as 

Yl"=l,j^lKe{Xi,XJ)O^JXj ^ ;r3T E"=l (B 82) 



where 



Since Xi, X2, ■ ■ ■ , Xn are i.i.d random variables, then Gj for j ^ i are also i.i.d 
random variables. However, the random vectors Fj for j ^ i are not independent, 
because the computation of Oij involves several data points which leads to possible 
dependency between O^^j and Oij^. Nonetheless, Theorem B.2 implies that the ran- 
dom vectors Fj are well approximated by the i.i.d random vectors Fj that are defined 
as 

Fj (x„x,) {{L.,P.,^^,,^X(x^),Ul{x,))Y^^^ , (B.83) 

and the approximation is given by 

Fj = F; + K,{x,, Xj)Oiel^cA + e'^'). (B.84) 

where we use £pcA = 0{n^^) (the following analysis can be easily modified to 
adjust the case £pcA = 0(n^3+T)). 

Since Gj, when j ^ i, are identical and independent random variables and -Fj, 
when J 7^ i, are identical and independent random vectors, we hereafter replace Fj 

53 



and Gj by F' and G in order to ease notation. By the law of large numbers we should 
expect the following approximation to hold 



n-1 = Sj^lj^i EG 

where 



(B.85) 



and 



EG= / i^,(a;„y)dy(y). (B.87) 

In order to analyze the error of this approximation, we make use of the result in 
[35] (equation (3.14), p. 132) to conclude a large deviation bound on each of the d 
coordinates of the error. Together with a simple union bound we obtain the following 
large deviation bound: 



Pr 



n-l X]j = l j/i Pj EF' 



-^y" , .,.G,- EG 



G2(n-l)a2e'^/2vol(A^) 

> a > < d exp < — ; 



ui{xi))\\'^ + 0(e)] 
(B.88) 

where Gi and G2 are some constants (related to d) . This large deviation bound implies 
that w.h.p. the variance term is Q( y^i/2gd/4-i/2 )■ As a result. 



-o( ,,. +4(:A + e^/^ 



,l/2gd/4-l/2 



which completes the proof for points away from the boundary. The proof for points 

inside the boundary is similar. 

□ 

B.5. [Proof of Theorem B.4| . Proof. We begin the proof by citing the follow- 
ing Lemma from [9l Lemma 8]: 

Lemma B.IO. Suppose f £ C^{M) and x ^ M/^, then 



e-'"^K,{x,y)J{y)dV{y) = mo/(x) + e"^ 
B^(x) d 



^^+w{x)f{x) 



+ 0(e^) 



where w{x) ~ s(x) + 2^'5^i- 1 1 , s{x) is the scalar curvature of the manifold at x, mi = 
/b,(o) ll^ll'^dl^^llM^^' ^i(O) = e : ll^^lk'^ < 1}; rn[ - S^^^^^\\x\\' K' {\\x\\)Ax, 
z{x) = /gd-1 |jn(0, 0)||d0, andii is the second fundamental form of Jv[ atx. 

Without loss of generality we may assume that mo = 1 for convenience of notation. 
By Lemma [B.10[ we get 

P.iv) = P{V) + + w{y)p{y)^ + 0(e2), (B.89) 
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which leads to 



pjy) 
p?{y) 



m2 / Ap(y)\ 



O(e^). 



(B.90) 



Plug (B.90) into the numerator of T^ aX{x): 

K,{x, y)P,,yX{y)p:"{y)p{y)dV{y) 
K,{x,y)P,,yX{y)p^-^{y) 



=^p:-{x) 

= p-'^{x) 

=pr{x) 

TO2e 



1 - "^"^ ( ■^(y) 



Ap(y) \ 
2p(y) J 



d\^(y)+0(e'^/2+2) 



i^,(a;,y)P,,,X(yy-"(y)dy(y) 



K{x,y)P,,yX{y)p^-"{y) (w{y) + ^) dV{y) + 0{e'/'+') 



- p-"ix)A ~ '^ap-''{x)B + 0(e'^/2+2) 



where 



A ■■= K.{x, y)P,,yXiy)p^-'^{y)dV{y), 

B ■■= !s^^^)K,{x,y)P...yX{y)p'-"{y) (w{y) 



My) 



dViy). 



Note that the a-normalized integral operator (B.9) is evaluated by changing the in- 
tegration variables to the local coordinates, and the odd monomials in the integral 
vanish because the kernel is symmetric. Thus, applying Taylor's expansion to A leads 
to: 

r-v^ r 



A 



K 



K' 



o - 



X{x) + VgX{x)t + VlgX{x)- + 0(^3) 



p'-^{x) + V,(/-")(x)t + ^l,[p^-){x)- + 0{t^) 



[f^-^ + Ric(6i, 9)t'^+^ + 0(i'^+2)] dM^ 



+ X{x) 



24^6 / 



dM6l 



^ / t 



VL(p^-")(a;)— dM0 



"2 



"^if VgX{x)Vg{p^-"){x)t'^+^dtd9 + Oie'^/^+^) 



From the definition of z{x) it follows that 

1 , / t \ ||n(0,0)||t'^+2 



24Ve 
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dtde = 



,d/2+l / 



m'^z(x) 



2A\S'i- 



Suppose {Ei}f^^ is an orthonormal basis of T^Ad, and express 9 = X]f=i (^i^i- A 
direct calculation shows that 



/ 



and similarly 



S' 



[ Ric(6», e)de = 



d-ll 



-s{x). 



Therefore, the first three terms of A become 
em2 A(p^~")(a;) eTO2 



1 + 



d 2pi-«(a;) d 
The last term is simplified to 



+ ^-^w{x) ) X{x) + ^V^X(x) 



(B.91) 



Next, we consider B. Note that since there is an e in front of i?, we only need to 



to simplify 



consider the leading order term. Denote Q{y) = p "(y) yw(y) -r 2p(y) J 
notation. Thus, applying Taylor's expansion to each of the terms in the integrand of 
B leads to: 



B 



[ KAx,y)P,,yX{y)Q{y)dV{y) 

, ( t \ ||n(6i,6i)||t3 



K 



t 

7^ 



K' 



O 



[X{x) + WeX{x)t + 0{t^)\ 



[Q{x) + V0Q(a-)t + 0{t^)\ [f^-^ + Ric(6', 0)1'^+^ + 0{t'^+'^)\ dtdO 
= e'^/^X{x)Q{x) + 0(e'^/2+i) 

In conclusion, the numerator of T^_aX{x) becomes 

eTO2 



d 



A{p^-"){x) Ap{x) 
— a 



Xix) 



2p^-"{x) 2p{x) 

+ e''/^+''^p7'^{x)p'-'^{x)V^X{x) + e<^/2+i^J^p-"(^) [ VeX{x)Ve{p'-''){x)de + 0{e''/^+^) 
Similar calculation of the denominator of the Tf^aX{x) gives 
/ K,„{x,y)p{y)dV{y) 



1 — ae 



, , Ap{y) \ 



= p7"(x) / K,{x,y)p'-'^{y) 
= p7"{x) [ K,ix,y)p'-"{y)dViy) 

Jb^,{x) 

'apj'^ix) [ K,{x, y)p'-''{y) (wiy) + dV{y) + 0(. 



dV{y) + 0{e'"^+^) 



m2e 



A/2+2 



em2 



p-''{x)C - ^^ap-"{x)D + 0(e'^/2+2) 

56 



where 

We apply Lemma B.10| to C and D: 



and 



In conclusion, the denommator of Tg^aX{x) is 



Putting all the above together, we have 



T,^„X(a;) = + e— V + e^^^r^^^ pi-a(^) + ) 

In particular, when a = 1, we have: 

□ 

B.6. [Proof of Theorem |5.2| . Proof. Suppose miuj^gawi d(a;, y) = e. Choose a 
normal coordinate . . . ,9^} on the geodesic ball B^i/2{x) around x so that xq = 
exp^(e9d(x)). Due to Gauss Lemma, we know span{9i(a;o), . . . , dd-i{xQ)} — T^gdM. 
and dd{xo) is outer normal at xq- 

We focus first on the integral appearing in the numerator of T^,iX{x): 

[ -^K,,Ax,y)P,,yX{y)p{y)dV{y). 

We divide the integral domain exp^^ {B ^{x) n A^) into slices S*,, defined by 

Sri ^ {{u,ri) eM.'^ : || (ui, . . . , u^-i, < ^e}, 

where rj G [— e-'^/^, e^/^] and u ~ {ui, . . . ,Ud-i) G M'^^-'^. By Taylor's expansion and 
(B.90I, the numerator of T^.iX becomes 

K,,i{x,y)P^.yX{y)p{y)AV{y) 

B^(x)r\M 

p:\x) I I ' k[ Vll"ir^^r j [xix)+Y^^u,\/a^X{x)+v\/a,X{x) + 0{e)j 

drjdu 



1 - ( w{v) 





- rf' 


Ap(y)\ 


+ C 


2p(y) ) 




v/|l«ll'^ 


- 





p-'ix) / / k[ ^ " ^ ^ + J2 u^^d.X{x) + ryVa,X(a;) + 0(e) dr/dw 



(B.92) 
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Note that in general the integral domain 5,, is not symmetric with related to (0, ... , 0, rj), 
so we will try to symmetrize 5^ by defining the symmetrized slices: 



Sri — (^i^iiRiSr; n Srf), 



where Ri{ui, . . . , u^, . . . , 77) = (ui, . . . , —Ui, . . . , 77). Note that from (B.23 ) in Lemma 
B.9| the orthonormal basis {Pxo,xdi{x), . . . , Pxo,xdd-i{x)} ofTxgdM differ from {di{xo), . . . , dd-i{xo)} 



by 0(e). Also note that up to error of order e'^/^, we can express dM n B^i/2{x) by 
a homogeneous degree 2 polynomial with variables iPx n.xdij x), . . . , Px„,xdd-i{x)}. 
Thus the difference between 5^ and 5,, is of order e and (B.92) can be reduced to: 



p-\x) 



X{x) + J2 u^Vo,X{x) + ri'sJo^Xix) + 0(e) dr^du 

(B.93) 



Next, we apply Taylor's expansion on X{x): 

Px.xoX{xa) - X{x) + eVo,X{x) + 0(e). 

Since 

Va,X{x) = Px^xoi^a.Xixo)) + 0{e'/'), 
the Taylor's expansion of X{x) becomes: 

X{x) = Px,xoiXixo) - eVg.Xixo) + 0(e)), 
Similarly for all i = 1, . . . , d we have 

Px,xoi'^9^Xixo)) = Vg^Xix) + 0(ei/2) 
Plugging ( |B.94p and ( |B.95[ ) into ( |B.93[ ) further reduce ( |B.92[ ) into: 



(B.94) 



(B.95) 



p-\x) 



Sr, J ~\fl 



Px^xo ( X{xo) + u,V3^X{xo) + (?? - i)Va^X{xo) + 0(e)J drydw 

(B.96) 



1=1 



The symmetry of the kernel implies that for i = 1, 



wMm = 0, 



and hence the numerator of Ti^^X{x) becomes 

p-\x)Px^xMlX{xo) + m\Vd,X{xo)) + Ole-^/^+i) 

where 



(B.97) 



(B.98) 



(B.99) 
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and 



mi 



Similarly, the denominator of T^^iX can be expanded as: 

B^{x)r]M 

which together with (B.98) gives us the following asymptotic expansion: 
r,,iX(x) = P,,,„ (x{xo) + ^Va,X(a;o)] +0(e). 



(B.lOO) 



(B.lOl) 



(B.102) 



Combining ( B.102[ ) with (B.lOl in Theorem B.3 we conclude the theorem. □ 

B.7. [Proof of Theorem 5.3 . Proof. We denote the spectrum of by 
{Xi}f^Q, where < Aq < Ai < . . ., and the corresponding eigenspaces by Ei := 
{X e L'^{TM) : V^X ^ -XiX], / = 0, 1, . . .. The eigen-vector-fields are smooth and 
form a basis for L^{TA4), that is, 

L\TM) = (BieNu{o}Ei. 

Thus we proceed by considering the approximation through eigen-vector-field sub- 
spaces. To simplify notation, we rescale the kenrel K so that — 1- 
Fix Xi E El. When x ^ M^, from Corollary B.5 we have uniformly 

I^.l^^M^^^V^Xiix) + 0{e). 
e 

When X G A4^, from Theorem 5.2 and the Neumann condition, we have uniformly 

(B.103) 



T,^iXi{x)^P,,,,Xiixo) + Oie). 



Note that we have 



Xi{xo) = Xi{x) + Px,xoV~eyd^Xi{xo) + 0(e), 
thus again by the Neumann condition at xq, (B.103) becomes 

T,,iXi{x)^Xi{x) + Oie). 
In conclusion, when x E Ai^ uniformly we have 

Tm^^iM^:^^ = 0(1). 

Note that when the boundary of the manifold is smooth, the measure of is 
0{e^^^). We conclude that in the sense, 

Te,lXi — Xi 



V^X, 
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(B.104) 



Next we show how T^^^ converges to e~*^^ . We know / + eV^ is invertible on 
El with norm | < ||/ + eV'^|| < 1 when e < Next, note that if _B is a bounded 
operator with norm ||_B|| < 1, we have the following bound for any s > by the 
binomial expansion: 



\{i + By-i\\ = 



sB 



<4B\\ 
= s\\B\\ 



- 1) ^2 



1) 



1 



1 



2! ' 
s - 1 

2! 
s - 1 

1! 



B\\^ 

\B\\~ 
\B\\- 



s{.s-l){s-2) ^3 

3^ ^ 

K.s-l)(5-2), 



3! 




(.-!)(.- 




3! 




(.-!)(.- 


2) II 



B\ 



2! 



BW 



BW 



(B.105) 



< s\\B 

= s\\BUl + \\B\\) 
On the other hand, note that on Ei 

e*^' = (/ + eV2) 

Indeed, for X e Ei, we have e*^'x = (1 - t\i + + . . .)X and (/ + eV^)^X = 

{l-tXi+t^Xf/2-teXf/2 + .. .)X by the binomial expansion. Thus we have the claim. 
Put all the above together, over Ei, for all / > 0, when e < we have: 



0(6) 



(B.106) 



IT. 



< 



{I- 
(I 



1 + 



0{e) 



/ + 0(e 



= (l + i + 0(e))(6i/4i + 0(e)) = 0(eJ), 



where the fir st equality comes from (B.104) and (B.106I, the third inequality comes 

<0(ei/4)on^ 



from (B.105 1. Thus we have \\Tl^ - e 
the proot is completed. □ 



j^Ei. By taking e — 0, 



Appendix C. Multiplicities of eigen-l-forms of Connection Laplacian 
over S*". All results and proofs in this section can be found in [IS] and [35]. Consider 
the following setting: 



G = SO{n + 1), SO{n), M = G/K = S*" 



.so(n + l), i = so{n) 



Denote £7^(5"") the complexified smooth 1 forms, which is a G-module by {g ■ s){x) = 
9'^{9~^^) for 9 & G, s G 57^(5'"), and x G S"". Over M we have Haar measure dfj, and 
Hodge Laplacian operator A = dS + Sd. Since A is a self-adjoint and uniform second 
order elliptic operator on r2P(S'"), the eigenvalues are discrete and non-negative 
real numbers, with only accumulation point at oo, and their related eigenspaces Ei 
are of finite dimension. We also know ®°ZiEi is dense in 17^(5'") in the topology 
defined by the inner product {f,g)s" = ^ s^^f ^ 3^*^^^ where (•,•) is the left invariant 
hermitian metric defined on S"". 

Since A is a G-invariant differential operator, its eigenspaces Ex are G-modules. 
We will count the multiplicity of E\ by first counting how many copies of E\ are inside 
ri^(S'") through the Frobenius reciprocity law, followed by the branching theorem 
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and calculating dimE\. On the other hand, since 5" is a symmetric space, we know 
A = —C over Q.^{S"), where C is the Casimir operator on G, and wo can determine 
the eigenvalue of C over any finite dimensional irreducible submodule of by 
Preudenthal's Formula. Finally we consider the relationship between real forms of g 
and complex forms of g. 

Note that g/i Or C ^ when G = SO{n + 1) and K = SO{n). Denote 
y = C" = AP(g/t) (8)r C as the standard representation of SO{n). 

There are two steps toward calculating the multiplicity of cigenforms over S". 

Step 1 Clearly f2^(S'") is a reducible G-module. For A S Irr(G, C), construct a 
G-homomorphism 

HomG(rA,l^^(5"))(^c Tx^n^s^) 

by (/) (g) M- (l){v). We call the image the Fx-isotypical summand in ri^(S'") with 
multiplicity dime Home (F^, ^^^(5"")). Then we apply Frobenius reciprocity law: 

HomG(rA,r!i(5")) ^ Hom;^(resgrA, AV) 

Thus if we can calculate dimHom;i-(res^rA, A^V), we know how many copies of the 
irreducible representation F;^ inside f2^(5'"). To calculate it, we apply the following 
facts. When Vi and Wj are irreducible representations of G, we have by Schur's 
lemma: 

Denote Li...L„ the basis for the dual space of Cartan subalgebra of so{2n) or 
so{2n + 1). Then Li. ..!/„, together with ^ J2^=i generate the weight lattice. The 
Weyl chamber of S0(2n + 1) is 



aiLi : ai > 02 > ... > a™ > o| , 



and the edges of the W arc thus the rays generated by the vectors Li, L1+L2, Li + 
L2... + Ln, for 50(271), the Weyl chamber is 



= l^ajZ/i : ai > 02 > ... > |a„|| , 



and the edges are thus the rays generated by the vectors Li,Li + L2, Li + L2... + 
Ln-2,Li + L2... + Ln and Li + L2... - L^. 

To keep notations unified, we denote the fundamental weights oji separately. 
When G = SO{2n), denote 

luq = when p = 

= ^^^^ A, when l<p<n— 1 (CI) 
^n = I t2i=i when p = n; 

when G = S0{2n + 1), denote 

Wo = when p = 

LOp = X]f=i when 1 < p < n — 2 

LOn-i = \ {r^iZl - when p = n (^-2) 

'^n = \ {TaZi + A„) when p = n 
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Theorem C.l. 

(1) When m — 2n+ 1, the exterior powers K^V of the standard representation V 
of so{2n + 1) is the irreducible representation with the highest weight ujp, when p < n 
and 2ujn when p = n. 

(2) When m — 2n, the exterior powers A^V of the standard representation V of 
so{2n) is the irreducible representation with the highest weight LOp, when p < n — 1; 
when p ~ n, K^V splits into two irreducible representations with the highest weight 
2ujm~i and 2ujm- 

Proof. Please see [T3] for details. □ 
Theorem C.2. (Branching theorem) 

When G = S0{2n) or G = S0{2n + 1), the restriction of the irreducible rep- 
resentations of G will he decomposed as direct sum of the irreducible representations 
of K = S0{2n — 1) or K — S0{2n) in the following way. Let Tx be an irreducible 
G-module over C with the highest weight A — X]r=i ^i^i ^ ^ 

then as a K -module, T\ decomposes into K -irreducible modules as follows: 

(1) ifm^2n (G = SO{2n) and K = SO{2n - 1) 

where runs over all A'^ such that 

Ai > A'l > A2 > A'2 > ... > A;_i > |A„| 

with \\ and Ai simultaneously all integers or all half integers. Here ^Y7^-i>'' Li the 
irreducible K-module with the highest weight X]"=i ^'i^i 

(2) ifm = 2n + l (G = SO{2n + 1) and K = SO{2n) ), 

Tx = x[u 

where ® runs over all X[ such that 

Ai > A'l > A2 > A'2 > ... > A;_i > A„ > |A'J 

with X'l and Ai simultaneously all integers or all half integers. Here r^"_^>,'i. is the 
irreducible K-module with the highest weight X]r=i A^^i 
Proof. Please see [U] for details. □ 

Based on the above theorems, we know how to calculate dime HomG(rA, Vi^{My). 
To be more precise, since the right hand side of Homi<-(res^rA, h^V) is the irreducible 
representation of K with the highest weight wi (or splits in the low dimension case), 
we know the dimHomx(res^rA, A^F) can be 1 only if res^F^ has the same highest 
weight by Schur's lemma and classification theorem. Please see [TS] for details. 

Step 2 In this step we relate the irreducible representation Vx C £7^(5'") to 
the eigenvalue of eigenforms. Consider the Laplace operator on SO{n + 1), which is 
endowed with a bi-invariant Riemannian metric. Since SO{n + 1) is semi-simple, we 
can take the metric on g to be given by the negative of the Killing form: {X, Y) = 
~tT{adXadY). The Laplace operator A related to this metric is the Hodge-Laplace 
operator which enjoys the following relationship between its eigenvalue and its related 
highest weight. 

Theorem C.3. Suppose C ft^{S") is an irreducible G-module with the highest 
weight pL, then we have 



A = (/i + 2p,Ai)/dr, 
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where / G F^, p is the half sum of all positive roots, and (•, •) is induced inner product 
on the dual Cartan subalgebra of g from the Killing form B. 
Proof. Please see [35] for details. □ 

Note that for S0{2n+1), p = X^ILi ('^ + 5 ~ since i?+ = {Li — Lj,Li + Lj , Li : i < j}; 
for SO{2n), p = J27=i ~ since i?+ = {Li — Lj,Li + Lj : i < j}. 

Combining these theorems, we know if is an irreducible representation of G, 
then it is an eigenspace of A with eigenvalue X — {p, + 2p, p). In particular, if we can 
decompose the eigenform space E\ C Q^{S^) into irreducible G-module, we can not 
only determine the eigenvalue A but also its multiplicity. Indeed, we know E\ — F®-'^, 
where X = {p + 2p,p), is the isotypical summand of F^ inside $7^(5"). 

Step 3 Now we apply Weyl Character Formula to calculate the dimension of F^ 
for G. 

Theorem C.4. 

(1) When m = 2n+l, consider X — XiLi, where Ai > ... > A„ > 0, the high- 
est weight of an irreducible representation T\ . Then dim Fa = IIi<j ^i<j 27-1+1-1- j ' 
where U = Xi + n — i + 1/2. In particular, when X — ujp, p < n, dim Fa — Cp"^^ . 

(2) When m — 2n, consider X = X]"=i where Ai > ... > |A„|, the highest 
weight of an irreducible representation Fa- Then dim Fa = '''jj/ 2n-i-j ' ''^^^^^ 
li = Xi + n ~ 1. In particular, when X = ujp, dim Fa = G^" when p < n and 
dimVx = G2;72. 

Proof. Please see [TS] for details. □ 

Step 4 We need the following theorem about real representations of G to solve 
the original problem: 

Theorem C.5. (1) When n is odd. Let a;, be the highest weight of the repre- 
sentation A'^V OS so2n+iC. For any weight X = OiWi + ... + a„_ia;„_i + a„w„/2 or 
so2n+iC, the irreducible representation T\ with highest weight X is real if Qn is even, 
or if n or 3 mod 4; if a„ is odd and n = 1 or 2 mod A, then V\ is quaternionic. 

(2) When n is even. The representation T\ of so2„M with highest weight X = 
aiUJi + ... + an-2^n-2 + OLn-i'^n-i + O'li'^n wHl be complcx if n is odd and a„_i 7^ a„; 
it will be quaternionic if n = 2 mod 4 and a„_i + a„ is odd; and it will be real 
otherwise. 

Proof. Please see [T5] for details. □ 

Combining the following Tables C.l and C.2 and this Theorem we know all the 
eigen-l-form spaces are real form. 

Step 5 Now we put all the above together. All the eigen-l-form spaces of S"" 
as an irreducible representation of SO{n -f 1), n > 4, are listed in the table C.l and 



C.2 A is the highest weight. The and S"^ cases are listed in C.3 and C.4 is 



separated since it has a different highest weight kLi — L2, which happens in general 
for n-forms in S"^""*"^. 

Consider SO{?>) for example. In this case, n = 1 and Ai — S^. From the analysis 
in cryo-EM [TS], we know that the multiplicities of eigenvectors are 6,10,..., which 
echoes the above analysis. 

In conclusion, we can see the following compatible first few multiplicities: 
S^: 6, 10, 14 
S^: 4, 6, 9, 16, 16 
S^: 5, 10, 14 
S^: 6, 15, 20 
5^: 7, 21, 27 
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Table C.l 

Eigenvalues and their multiplicity of S^", where n >2. li = Xi + n — i and mj = n — i. 



A 


eigenvalues 


multiplicity 


kLi, fc > 1 


fc(fc + 2n- 1) 


n ^ 


kLi + L2, k>l 


(A; + l)(A; + 2n-2) 


n ^ 



Table C.2 

Eigenvalues and their multiplicity of 5^"+^, where n > 2. li = Xi + n — i + 1/2 ond mj = 
+ 1/2. 



A 


eigenvalues 


multiplicity 


kLi, k>l 


kik + 2n) 


,'2 ,2 

n ' ' n 


kLi+L2, k>l 


{k + l){k + 2n-l) 


,2 ,2^ 

n ^ n 



Table C.3 
Eigenvalues and their multiplicity of . 



A 


eigenvalues 


multiplicity 


kLi, k > 1 


k{k + 2) 


{k + iy 


kLi + L2, k>l 




k{k + 2) 


kLi — L2, k > 1 


{k+ir 


k{k + 2) 



Table C.4 
Eigenvalues and their multiplicity of . 



A 


eigenvalues 


multiplicity 


kLi k>l 


k{k + l) 


2(2fc + 1) 
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